G-ANMI: A mutual information based genetic clustering algorithm for categorical data

作者:

Highlights:

摘要

Identification of meaningful clusters from categorical data is one key problem in data mining. Recently, Average Normalized Mutual Information (ANMI) has been used to define categorical data clustering as an optimization problem. To find globally optimal or near-optimal partition determined by ANMI, a genetic clustering algorithm (G-ANMI) is proposed in this paper. Experimental results show that G-ANMI is superior or comparable to existing algorithms for clustering categorical data in terms of clustering accuracy.

论文关键词:Clustering,Categorical data,Genetic algorithm,Mutual information,Cluster ensemble,Data mining

论文评审过程:Received 5 November 2008, Revised 11 October 2009, Accepted 1 November 2009, Available online 10 November 2009.

论文官网地址:https://doi.org/10.1016/j.knosys.2009.11.001