An analysis of the coherence of descriptors in topic modeling

作者:

Highlights:

• We evaluate the coherence and generality of topic descriptors found by LDA and NMF.

• Six new and existing corpora were specifically compiled for this evaluation.

• A new coherence measure using word2vec-modeled term vector similarity is proposed.

• NMF regularly produces more coherent topics, where term weighting is influential.

• NMF may be more suitable for topic modeling of niche or non-mainstream corpora.

摘要

•We evaluate the coherence and generality of topic descriptors found by LDA and NMF.•Six new and existing corpora were specifically compiled for this evaluation.•A new coherence measure using word2vec-modeled term vector similarity is proposed.•NMF regularly produces more coherent topics, where term weighting is influential.•NMF may be more suitable for topic modeling of niche or non-mainstream corpora.

论文关键词:Topic modeling,Topic coherence,LDA,NMF

论文评审过程:Available online 9 March 2015.

论文官网地址:https://doi.org/10.1016/j.eswa.2015.02.055