Online mixture-based clustering for high dimensional count data using Neerchal–Morel distribution

作者:

Highlights:

摘要

In this paper, we propose an online learning technique for unsupervised clustering based on a mixture of Neerchal–Morel distributions (NMD). Online learning is able to overcome the drawbacks of batch learning in such a way that the mixture’s parameters can be updated instantly for any new data instances. Then, we use a novel Minorization–Maximization framework to address the issue of high dimensional optimization and the mixture’s parameters estimation. Finally, by implementing a minimum message length model selection criterion, the weights of irrelevant mixture components are driven towards zero, which resolves the problem of knowing the number of clusters beforehand. To evaluate the performance of our proposed model, we have considered 3 challenging real-world applications that involve high-dimensional count vectors, namely, topic clustering, medical diagnosis and human action recognition. The results show that the mixture model based on the NMD performs better than other similar models.

论文关键词:Count data,Overdispersion,Minorization–Maximization,Finite mixture models,Neerchal–Morel distribution,Minimum message length

论文评审过程:Received 18 August 2020, Revised 30 November 2020, Accepted 14 April 2021, Available online 4 May 2021, Version of Record 6 May 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107051