Variable space hidden Markov model for topic detection and analysis

作者:

Highlights:

摘要

Discovering topics from large amount of documents has become an important task recently. Most of the topic models treat document as a word sequence, whether in discrete character or term frequency form. However, the number of words in a document is greatly different from that in other documents. This will lead to several problems for current topic models in dealing with topics analysis. On the other hand, it is difficult to perform topic transition analysis based on current topic models. In an attempt to overcome these deficiencies, a variable space hidden Markov model (VSHMM) is proposed to represent the topics, and several operations based on space computation are presented. A hierarchical clustering algorithm with dynamically changing of the component number in topic model is proposed to demonstrate the effectiveness of the VSHMM. Method of document partition based on topic transition is also present. Experiments on a real-world dataset show that the VSHMM can improve the accuracy while decreasing the algorithm’s time complexity greatly compared with the algorithm based on current mixture model.

论文关键词:Topic detection,Variable space hidden Markov model,Topic transition,Hierarchical clustering

论文评审过程:Received 19 May 2007, Revised 12 September 2007, Accepted 12 September 2007, Available online 21 September 2007.

论文官网地址:https://doi.org/10.1016/j.knosys.2007.09.001