News Retrieval Based On Latent Semantic Index And Clustering
[Full Text]
AUTHOR(S)
Prerna, Rajesh Singh, Pawan Bhadana
KEYWORDS
Keywords: - Extraction, Algorithm, Mining
ABSTRACT
Abstract:- Web is a collection of heterogeneous as well as unstructured set of news articles. This paper presents a novel approach to retrieve relevant news articles from heterogeneous and unstructured collection of articles. Efficient retrieval requires analysis of news articles based on keyword. Two problems that occur in the analysis of news articles are synonymy and polysemy. In this paper, we present a News Retrieval approach based on Latent Semantic Index (LSI) and Clustering. It includes projection of keyword-news article matrix into small spaces called clusters. After that, clustering approach is used to group relevant articles into clusters.
REFERENCES
[1]. Cutting, D, Karger, D, Pederson, J & Tukey, J (1992). Scatter/gather: A clusterbased approach to browsing large document collections. In Proceedings of ACM SIGIR.
[2]. Guduru, N (2006). Text mining with support vector machines and nonnegative matrix factorization algorithm. Masters Thesis. University of Rhode Island, CS Dept.
[3]. Xu, W, Liu, X & Gong, Y (2003). Document clustering based on nonnegative matrix factorization. Proceedings of ACM SIGIR, pages 267–273.
[4]. Dhillon, SI & Modha, DS (2001). Concept decompositions for large sparse text data using clustering.
[5]. Landauer, T, Foltz, PW & Laham, D(1998). Introduction to Latent Semantic Analysis.. Discourse Processes 25: pages 259–284.
[6]. Michels, S (July 5, 2007). Problem Solving on LargeScale Clusters, Lecture 4.
[7]. Dean, J & Ghemawat, J (December 2004 ). MapReduce: Simplified Data Processing on Large Clusters. In the Proceedings of the 6th Symp. on Operating Systems Design and Implementation.
[8]. Tropp, J., An Alternating minimization algorithm for non-negative matrix approximation.
[9]. Lee, D., Seung, H., Learning the Parts of Objects by Non-negative matrix factorization in Nature (1999).
[10]. Tong, S, Koller, D., Support Vector Machine Active Learning with Applications to Text Classification.
[11]. Lee, D., Seung, H.S., Learning the Parts of Objects by Non-negative matrix factorization in Nature (1999).
[12]. Florian Beil, Martin Ester, and Xiaowei Xu. “Frequent Term-Based Text Clustering”, In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining New York, NY, USA.
[13]. Raymond Kosala and Hendrik Blockeel, “Web Mining Research: A survey”, SIGKDD Exploration, Vol.2 issue 1, July 2000, pp- 1-15.
[14]. Aura Conci., Everest Mathias M. M. Castro “Image Mining By Color Content “
[15]. Zhang Ji, Wynne Hsu, Mong Li Lee, “Image Mining: Issues, Frameworks and Techniques”, in Proc. of the 2nd International Workshop on Multimedia Data Mining (MDM/KDD'2001), San Francisco, CA, USA, 2001, pp. 13-20.
|