Mining microarray gene expression data with unsupervised possibilistic clustering and proximity graphs

作者：L. B. Romdhane, H. Shili, B. Ayeb

摘要

Gene expression data generated by DNA microarray experiments provide a vast resource of medical diagnostic and disease understanding. Unfortunately, the large amount of data makes it hard, sometimes even impossible, to understand the correct behavior of genes. In this work, we develop a possibilistic approach for mining gene microarray data. Our model consists of two steps. In the first step, we use possibilistic clustering to partition the data into groups (or clusters). The optimal number of clusters is evaluated automatically from the data using the Information Entropy as a validity measure. In the second step, we select from each computed cluster the most representative genes and model them as a graph called a proximity graph. This set of graphs (or hyper-graph) will be used to predict the function of new and previously unknown genes. Experimental results using real-world data sets reveal a good performance and a high prediction accuracy of our model.

论文关键词：Gene expression microarray data, Data mining, Possibilistic clustering, Proximity graph

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-009-0161-3