Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

作者:A. Kabán, M. A. Girolami

摘要

This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.

论文关键词:latent semantic indexing, probabilistic latent semantic analysis, projection pursuit, semantic feature extraction, text analysis

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1013801028884