Vocabulary size and its effect on topic representation

作者:

Highlights:

• The impact of vocabulary reduction on topic modeling is explored for three data sets.

• Results are compared using four document and topic-centered measures.

• Removal of singly occurring terms has minimal influence on topics.

• Removal of frequently occurring terms greatly influences topic outcomes for three measures.

摘要

•The impact of vocabulary reduction on topic modeling is explored for three data sets.•Results are compared using four document and topic-centered measures.•Removal of singly occurring terms has minimal influence on topics.•Removal of frequently occurring terms greatly influences topic outcomes for three measures.

论文关键词:Information retrieval,Informetrics,Topic modeling,Latent Dirichlet allocation,Vocabulary size,Term frequency

论文评审过程:Received 16 February 2016, Revised 26 September 2016, Accepted 9 January 2017, Available online 30 January 2017, Version of Record 30 January 2017.

论文官网地址:https://doi.org/10.1016/j.ipm.2017.01.003