Global feature selection from microarray data using Lagrange multipliers

作者:

Highlights:

摘要

In microarray-based gene expression analysis, thousands of genes are involved to monitor their expression levels under a particular condition. In fact, however, only few of them are highly expressed, which has been proven by Golub et al. How to identify these discriminative genes effectively is a significant challenge to risk assessment, diagnosis, prognostication in growing cancer incidence and mortality.In this paper, we present a global feature selection method based on semidefinite programming model which is relaxed from the quadratic programming model with maximizing feature relevance and minimizing feature redundancy. The main advantage of relaxation is that the matrix in mathematical model only requires symmetric matrix rather than positive (or semi) definite matrix. In semidefinite programming model, each feature has one constraint condition to restrict the objective function of feature selection problem. Herein, another trick in this paper is that we utilize Lagrange multiplier as proxy measurement to identify the discriminative features instead of solving a feasible solution for the original max-cut problem. The proposed method is compared with several popular feature selection methods on seven microarray data sets. The results demonstrate that our method outperforms the others on most data sets, especially for the two hard feature selection data sets, Beast(Wang) and Medulloblastoma.

论文关键词:Gene selection,Max-cut,Semidefinite programming,Lagrange multiplier,High-dimensional and small sample size

论文评审过程:Received 19 December 2015, Revised 25 July 2016, Accepted 26 July 2016, Available online 27 July 2016, Version of Record 29 September 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.07.035