CNAK: Cluster number assisted K-means

作者:

Highlights:

• Monte-Carlo simulation based algorithm for automatic detection of cluster number and cluster representatives.

• A solution for providing an appropriate size of sampled dataset required in proposed algorithm.

• A set of comparisons for cluster number detection and clustering solution with state of the art technologies.

• Applicability of proposed method in large scale dataset.

• Explore the behavior of our method in a few relevant issues in clustering: 1) detection of a single cluster in the absence of any other cluster in a dataset, 2) presence of hierarchy, 3) clustering of a high dimensional dataset, 4) robustness over dataset having cluster imbalance, and 5) ro-bustness to noise.

摘要

•Monte-Carlo simulation based algorithm for automatic detection of cluster number and cluster representatives.•A solution for providing an appropriate size of sampled dataset required in proposed algorithm.•A set of comparisons for cluster number detection and clustering solution with state of the art technologies.•Applicability of proposed method in large scale dataset.•Explore the behavior of our method in a few relevant issues in clustering: 1) detection of a single cluster in the absence of any other cluster in a dataset, 2) presence of hierarchy, 3) clustering of a high dimensional dataset, 4) robustness over dataset having cluster imbalance, and 5) ro-bustness to noise.

论文关键词:K-means clustering,Bipartite graph,Perfect matching,Kuhn-Munkres algorithm,Stability

论文评审过程:Received 7 March 2020, Revised 12 August 2020, Accepted 29 August 2020, Available online 30 August 2020, Version of Record 2 September 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107625