Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets

作者:

Highlights:

摘要

Clustering analysis is the process of dividing a set of objects into none-overlapping subsets. Each subset is a cluster, such that objects in the cluster are similar to one another and dissimilar to the objects in the other clusters. Most of the algorithms in partitioning approach of clustering suffer from trapping in local optimum and the sensitivity to initialization and outliers. In this paper, we introduce a novel partitioning algorithm that its initialization does not lead the algorithm to local optimum and can find all the Gaussian-shaped clusters if it has the right number of them. In this algorithm, the similarity between pairs of objects are computed once and updating the medoids in each iteration costs O(k × m) where k is the number of clusters and m is the number of objects needed to update medoids of the clusters. Comparison between our algorithm and two other partitioning algorithms is performed by using four well-known external validation measures over seven standard datasets. The results for the larger datasets show the superiority of the proposed algorithm over two other algorithms in terms of speed and accuracy.

论文关键词:Clustering analysis,Partitioning clustering,k-Medoids clustering,k-Harmonic means,External validation measures

论文评审过程:Received 13 May 2012, Revised 6 October 2012, Accepted 14 October 2012, Available online 22 November 2012.

论文官网地址:https://doi.org/10.1016/j.knosys.2012.10.012