LSISOM — A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections

作者:Nikolaos Ampazis, Stavros J. Perantonis

摘要

The Self Organizing Map (SOM) algorithm has been utilized, with much success, in a variety of applications for the automatic organization of full-text document collections. A great advantage of the SOM method is that document collections can be ordered in such a way so that documents with similar content are positioned at nearby locations of the 2-dimensional SOM lattice. The resulting ordered map thus presents a general view of the document collection which helps the exploration of information contained in the whole document space. The most notable example of such an application is the WEBSOM method where the document collection is ordered onto a map by utilizing word category histograms for representing the documents data vectors. In this paper, we introduce the LSISOM method which resembles WEBSOM in the sense that the document maps are generated from word category histograms rather than simple histograms of the words. However, a major difference between the two methods is that in WEBSOM the word category histograms are formed using statistical information of short word contexts whereas in LSISOM these histograms are obtained from the SOM clustering of the Latent Semantic Indexing representation of document terms.

论文关键词:data representation, document clustering, information retrieval, latent semantic indexing, self-organizing maps, unsupervised learning

论文评审过程:

论文官网地址:https://doi.org/10.1023/B:NEPL.0000023449.95030.8f