Subtype dependent biomarker identification and tumor classification from gene expression profiles

作者:

Highlights:

摘要

Gene expression profiles are being used to categorize disease specific genes and classify different tumor subtypes at the molecular level. Due to the inherent nature of these data having high dimensionality and small sample sizes, current conventional machine learning and statistical techniques have drawbacks in achieving satisfactory predictive classification performance in clinical samples. The typical approach to handling this situation is to eliminate noisy and redundant genes from the original gene space. There are currently multiple gene selection methods available, but most of them seek to find a common subset of genes for all tumor subtypes and fail to reflect the unique characteristics of each subtype. Consequently, in this study, we propose a general framework that aims to identify subset of genes for each tumor subtype, and also give another gene selection framework that combines the obtained subtype specific gene subsets into a single gene subset. We then present a corresponding classification model for distinguishing different tumor subtypes, and implement three specific gene selection algorithms within the two frameworks. Finally, extensive experimental results on the six benchmark microarray data validate the proposed tumor subtype dependent selection process to predict and rank specific molecular biomarkers to define tumor subtypes. This new process contributes significantly to the enhancement of tumor-predictive classification performance.

论文关键词:Biomarker identification,Tumor subtype,Gene selection,Microarray data,Subtype dependent

论文评审过程:Received 2 October 2017, Revised 19 January 2018, Accepted 26 January 2018, Available online 31 January 2018, Version of Record 28 February 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.01.025