Enhanced bisecting k-means clustering using intermediate cooperation

作者:

Highlights:

摘要

Bisecting k-means (BKM) is very attractive in many applications as document-retrieval/indexing and gene expression analysis problems. However, in some scenarios when a fraction of the dataset is left behind with no other way to re-cluster it again at each level of the binary tree, a “refinement” is needed to re-cluster the resulting solutions. Current approaches to refine the clustering solutions produced by the BKM employ end-result enhancement using k-means (KM) clustering. In this hybrid model, KM waits for the former BKM to finish its clustering and then it takes the final set of centroids as initial seeds for a better refinement. In this paper, a cooperative bisecting k-means (CBKM) clustering algorithm is presented. The CBKM concurrently combines the results of the BKM and KM at each level of the binary hierarchical tree using cooperative and merging matrices. Undertaken experimental results show that the CBKM achieves better clustering quality than that of KM, BKM, and single linkage (SL) algorithms with comparable time performance over a number of artificial, text documents, and gene expression datasets.

论文关键词:Bisecting clustering,Cooperative clustering,Quality measures

论文评审过程:Received 19 March 2008, Revised 15 January 2009, Accepted 5 March 2009, Available online 14 March 2009.

论文官网地址:https://doi.org/10.1016/j.patcog.2009.03.011