How much can k-means be improved by using better initialization and repeats?

作者:

Highlights:

• K-means clustering algorithm can be significantly improved by using a better initialization technique, and by repeating (re-starting) the algorithm.

• When the data has overlapping clusters, k-means can improve the results of the initialization technique.

• When the data has well separated clusters, the performance of k-means depends completely on the goodness of the initialization.

• Initialization using simple furthest point heuristic (Maxmin) reduces the clustering error of k-means from 15% to 6%, on average.

摘要

•K-means clustering algorithm can be significantly improved by using a better initialization technique, and by repeating (re-starting) the algorithm.•When the data has overlapping clusters, k-means can improve the results of the initialization technique.•When the data has well separated clusters, the performance of k-means depends completely on the goodness of the initialization.•Initialization using simple furthest point heuristic (Maxmin) reduces the clustering error of k-means from 15% to 6%, on average.

论文关键词:Clustering algorithms,K-means,Initialization,Clustering accuracy,Prototype selection

论文评审过程:Received 13 August 2018, Revised 11 March 2019, Accepted 13 April 2019, Available online 15 April 2019, Version of Record 22 April 2019.

论文官网地址:https://doi.org/10.1016/j.patcog.2019.04.014