Enhancing Information Retrieval using Concept- Based Mining Model with Feature Extraction and Clustering

Suresh S, Shobana M


Most of the common techniques in text mining are based on the statistical analysis of a term, either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying text mining model should indicate terms that capture the semantics of text. In this case, the mining model can capture terms that present the concepts of the sentence, which leads to discovery of the topic of the document. Now a day’s all the information’s are available with clear diagrammatic explanation or with related images. An image examination method can automate the recognition of landmarks and events in large image collections, significantly getting better Content utilization experience. The wide adoption of photo sharing applications and the enormous amounts of user-generated content uploaded to them raises an information overload issue for users. The concept-based mining model can effectively discriminate between non important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed mining model consists of sentence-based concept analysis, document-based concept analysis, corpus-based concept-analysis, and concept-based similarity measure. The term which contributes to the sentence semantics is analyzed on the sentence, document, and corpus levels rather than the traditional analysis of the document only. An Automated Content Organization technique to defeat such an overload is to collect images into groups based on their similarity and then use the derived clusters to support navigation and browsing of the collection. In this paper, we present a community detection (i.e. graph-based clustering) approach that makes use of both visual and tagging features of images in order to efficiently extract groups of correlated images within large image collections. We perform clustering on such image similarity graphs by means of community detection, a process that identifies on the graph groups of nodes that are more closely associated to each other. We categorize the resultant image clusters as landmarks or events by use of features related to the temporal, community, and label characteristics of image clusters.


landmarks and events, Content Organization technique, similarity, clusters


M. L. KHERFI AND D. ZIOU, A. BERNARDI, “Image Retrieval From the World Wide Web: Issues, Techniques, and Systems”, ACM Computing Surveys, Vol. 36, No. 1, March 2004.

S. Papadopoulos et al., ‘‘Image Clustering through Community Detection on Hybrid Image Similarity Graphs,’’ Proc. Int’l Conf. Image Processing, IEEE Press, 2010.

D.G. Lowe, ‘‘Distinctive Image Features from Scale-Invariant Keypoints,’’ Int’l J. Computer Vision vol. 60, no. 2, 2004, pp. 91-110 (software available at http://www.cs.ubc.ca/~lowe/keypoints/).

YU MENG and Dr. Bernard Tiddeman,” Implementing the Scale Invariant Feature Transform(SIFT) Method”at www.cs.standrews.ac.uk/~yumeng/yumeng-SIFTreport-5.18_bpt.pdf .

B. Gao, T.Y. Liu, T. Qin, X. Zheng, Q.S. Cheng, and W.Y.Ma, “Web image clustering by consistent utilization of visual features

T. Honkela, S. Kaski, K. Lagus, and T. Kohonen, “WEBSOM—Self-

Organizing Maps of Document Collections,” Proc. Workshop Self-Organizing Maps (WSOM ’97), 1997.

L. Talavera and J. Bejar, “Generality-Based Conceptual Clustering with Probabilistic Concepts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 196-206, Feb. 2001.

D. Gildea and D. Jurafsky, “Automatic Labeling of Semantic Roles,” Computational Linguistics, vol. 28, no. 3, pp. 245-288, 2002.

S. Pradhan, W. Ward, K. Hacioglu, J. Martin, and D. Jurafsky, “Shallow Semantic Parsing Using Support Vector Machines,” Proc. Human Language Technology/North Am. Assoc. for Computational Linguistics (HLT/NAACL), 2004.

S. Pradhan, K. Hacioglu, W. Ward, J.H. Martin, and D. Jurafsky, “Semantic Role Parsing: Adding Semantic Structure to Unstructured Text,” Proc. Third IEEE Int’l Conf. Data Mining (ICDM), pp. 629-632, 2003.

S. Pradhan, K. Hacioglu, V. Krugler, W. Ward, J.H. Martin, and D. Jurafsky, “Support Vector Learning for Semantic Argument Classification,” Machine Learning, vol. 60, nos. 1-3, pp. 11-39, 2005J. Padhye, V. Firoiu, and D. Towsley, “A stochastic model of TCP Reno congestion avoidance and control,” Univ. of Massachusetts, Amherst, MA, CMPSCI Tech. Rep. 99-02, 1999.

D. Gildea and D. Jurafsky, “Automatic Labeling of Semantic Roles,” Computational Linguistics, vol. 28, no. 3, pp. 245-288, 2002.

S. Shehata, F. Karray, and M. Kamel, “Enhancing Text Clustering Using Concept-Based Mining Model,” Proc. Sixth IEEE Int’l Conf. Data Mining (ICDM), 2006.

Andreas Girgensohn, Frank Shipman, Thea Turner, Lynn Wilcox, “Flexible Access to Photo Libraries via Time, Place, Tags, and Visual Features”,IEEE International Conference, 2004.

[P. Kingsbury and M. Palmer, “Propbank: The Next Level of Treebank,” Proc. Workshop Treebanks and Lexical Theories, 2003.

Giovanni Quattrone and Licia Capra, Pasquale De Meo and Emilio Ferrara, Domenico Ursino DIMET,” Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure”, International Conference on 2011 ACM 978-1-4503-0717-8/11/10 ...$10.00.

Full Text: PDF


  • There are currently no refbacks.


All Rights Reserved © 2012 IJARCSEE

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.