Fast modified global k-means algorithm for incremental cluster construction

作者:

Highlights:

摘要

The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms.

论文关键词:Minimum sum-of-squares clustering,Nonsmooth optimization,k-means algorithm,Global k-means algorithm

论文评审过程:Received 23 February 2010, Revised 24 September 2010, Accepted 24 October 2010, Available online 30 October 2010.

论文官网地址:https://doi.org/10.1016/j.patcog.2010.10.018