On the use of Bernoulli mixture models for text classification

作者:

Highlights:

摘要

Mixture modelling of class-conditional densities is a standard pattern recognition technique. Although most research on mixture models has concentrated on mixtures for continuous data, emerging pattern recognition applications demand extending research efforts to other data types. This paper focuses on the application of mixtures of multivariate Bernoulli distributions to binary data. More concretely, a text classification task aimed at improving language modelling for machine translation is considered.

论文关键词:Mixture models,EM algorithm,Data categorization,Multivariate binary data,Text classification,Multivariate Bernoulli distribution

论文评审过程:Received 31 October 2001, Accepted 31 October 2001, Available online 13 January 2002.

论文官网地址:https://doi.org/10.1016/S0031-3203(01)00242-4