Topic analysis using a finite mixture model

作者:

Highlights:

摘要

Addressed here is the issue of ‘topic analysis’ which is used to determine a text’s topic structure, a representation indicating what topics are included in a text and how those topics change within the text. Topic analysis consists of two main tasks: topic identification and text segmentation. While topic analysis would be extremely useful in a variety of text processing applications, no previous study has so far sufficiently addressed it. A statistical learning approach to the issue is proposed in this paper. More specifically, topics here are represented by means of word clusters, and a finite mixture model, referred to as a stochastic topic model (STM), is employed to represent a word distribution within a text. In topic analysis, a given text is segmented by detecting significant differences between STMs, and topics are identified by means of estimation of STMs. Experimental results indicate that the proposed method significantly outperforms methods that combine existing techniques.

论文关键词:Topic,Text segmentation,Statistical learning,Mixture model,Word clustering

论文评审过程:Received 10 March 2001, Accepted 29 April 2002, Available online 4 June 2002.

论文官网地址:https://doi.org/10.1016/S0306-4573(02)00035-3