An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

作者:

Highlights:

摘要

The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points.

论文关键词:The k-modes-type algorithms,Categorical data,Initial cluster centers,The number of clusters,Density measure

论文评审过程:Received 19 April 2010, Revised 21 February 2011, Accepted 24 February 2011, Available online 2 March 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.02.015