Snapshot ensembles of non-negative matrix factorization for stability of topic modeling

作者:Jipeng Qiang, Yun Li, Yunhao Yuan, Wei Liu

摘要

Recently many topic models such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) have made important progress towards generating high-level knowledge from a large corpus. However, these algorithms based on random initialization generate different results on the same corpus using the same parameters, denoted as instability problem. For solving this problem, ensembles of NMF are known to be much more stable and accurate than individual NMFs. However, training multiple NMFs for ensembling is computationally expensive. In this paper, we propose a novel scheme to obtain the seemingly contradictory goal of ensembling multiple NMFs without any additional training cost. We train a single NMF algorithm with the cyclical learning rate schedule, which can converge to several local minima along its optimization path. We save the results to the ensemble when the model converges, and then restart the optimization with a large learning rate that can help escape the current local minimum. Based on experiments performed on text corpora using a number of measures to assess, our method can reduce instability at no additional training cost, while simultaneously yields more accurate topic models than traditional single methods and ensemble methods.

论文关键词:LDA, NMF, Short text clustering, Topic modeling, Ensemble

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-018-1192-4