Issues in clustering algorithm consistency in fixed dimensional spaces. Some solutions for k-means

作者:Mieczysław A. Kłopotek, Robert A. Kłopotek

摘要

Kleinberg introduced an axiomatic system for clustering functions. Out of three axioms, he proposed, two (scale invariance and consistency) are concerned with data transformations that should produce the same clustering under the same clustering function. The so-called consistency axiom provides the broadest range of transformations of the data set. Kleinberg claims that one of the most popular clustering algorithms, k-means does not have the property of consistency. We challenge this claim by pointing at invalid assumptions of his proof (infinite dimensionality) and show that in one dimension in Euclidean space the k-means algorithm has the consistency property. We also prove that in higher dimensional space, k-means is, in fact, inconsistent. This result is of practical importance when choosing testbeds for implementation of clustering algorithms while it tells under which circumstances clustering after consistency transformation shall return the same clusters. Two types of remedy are proposed: gravitational consistency property and dataset consistency property which both hold for k-means and hence are suitable when developing the mentioned testbeds.

论文关键词:Cluster analysis, Consistency property, Gravitational consistency property, Fixed dimensional Euclidean space consistency, k-Means algorithm

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-021-00657-6