A countably infinite mixture model for clustering and feature selection
作者:Nizar Bouguila, Djemel Ziou
摘要
Mixture modeling is one of the most useful tools in machine learning and data mining applications. An important challenge when applying finite mixture models is the selection of the number of clusters which best describes the data. Recent developments have shown that this problem can be handled by the application of non-parametric Bayesian techniques to mixture modeling. Another important crucial preprocessing step to mixture learning is the selection of the most relevant features. The main approach in this paper, to tackle these problems, consists on storing the knowledge in a generalized Dirichlet mixture model by applying non-parametric Bayesian estimation and inference techniques. Specifically, we extend finite generalized Dirichlet mixture models to the infinite case in which the number of components and relevant features do not need to be known a priori. This extension provides a natural representation of uncertainty regarding the challenging problem of model selection. We propose a Markov Chain Monte Carlo algorithm to learn the resulted infinite mixture. Through applications involving text and image categorization, we show that infinite mixture models offer a more powerful and robust performance than classic finite mixtures for both clustering and feature selection.
论文关键词:Non-parametric Bayesian methods, Dirichlet process, Clustering, Feature selection, Mixture models, Generalized Dirichlet, MCMC, Categorization
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10115-011-0467-4