Ensemble gene selection for cancer classification

作者：

Highlights：

•

摘要

Cancer diagnosis is an important emerging clinical application of microarray data. Its accurate prediction to the type or size of tumors relies on adopting powerful and reliable classification models, so as to patients can be provided with better treatment or response to therapy. However, the high dimensionality of microarray data may bring some disadvantages, such as over-fitting, poor performance and low efficiency, to traditional classification models. Thus, one of the challenging tasks in cancer diagnosis is how to identify salient expression genes from thousands of genes in microarray data that can directly contribute to the phenotype or symptom of disease. In this paper, we propose a new ensemble gene selection method (EGS) to choose multiple gene subsets for classification purpose, where the significant degree of gene is measured by conditional mutual information or its normalized form. After different gene subsets have been obtained by setting different starting points of the search procedure, they will be used to train multiple base classifiers and then aggregated into a consensus classifier by the manner of majority voting. The proposed method is compared with five popular gene selection methods on six public microarray datasets and the comparison results show that our method works well.

论文关键词：Gene selection,Ensemble learning,Cancer classification,Mutual information,Microarray data

论文评审过程：Received 12 May 2009, Revised 17 December 2009, Accepted 11 February 2010, Available online 17 February 2010.

论文官网地址：https://doi.org/10.1016/j.patcog.2010.02.008