Regularized Gaussian Mixture Model based discretization for gene expression data association mining

作者:Ruichu Cai, Zhifeng Hao, Wen Wen, Lijuan Wang

摘要

Association rule has shown its usefulness in the gene expression data based disease diagnosis for its good interpretability. The large number of rules generated from the high dimensional gene expression data is one of the main challenges of its applications. In this work, we reveal that the discretization preprocessing is one of the reasons for the association rule number explosion problem. To alleviate this problem, a Regularized Gaussian Mixture Model (RGMM) is proposed to discretize the continuous gene expression data. RGMM explores both the complexity of the discretization model and the information loss of the discretization procedure, under the Minimal Description Length framework. Extensive experiments show the effectiveness of RGMM on real-life gene expression data sets.

论文关键词:Unsupervised discretization, Gaussian Mixture Model, Association rule, Gene expression data

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-013-0435-7