A probabilistic topic model based on short distance Co-occurrences

作者:

Highlights:

• A new probabilistic topic model is proposed which is named LLDA.

• LLDA focuses on local word relationships and is sensitive to the order of words.

• Word orders are encoded using overlapping windows and not n-gram models.

• LLDA outperforms other related established topic models in document clustering.

• The topics extracted by LLDA are more similar to human-generated topics.

摘要

•A new probabilistic topic model is proposed which is named LLDA.•LLDA focuses on local word relationships and is sensitive to the order of words.•Word orders are encoded using overlapping windows and not n-gram models.•LLDA outperforms other related established topic models in document clustering.•The topics extracted by LLDA are more similar to human-generated topics.

论文关键词:Probabilistic topic model,Latent Dirichlet Allocation,Document clustering,Context window,Local co-occurrence,Word order

论文评审过程:Received 31 March 2020, Revised 4 September 2020, Accepted 6 January 2022, Available online 10 January 2022, Version of Record 20 January 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.116518