Reusable components for partitioning clustering algorithms

作者:Boris Delibašić, Kathrin Kirchner, Johannes Ruhland, Miloš Jovanović, Milan Vukićević

摘要

Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them.

论文关键词:Cluster algorithm, Partitioning clustering, Reusable component, Generic, Kohonen SOM, K-means, X-means, MPCK-means

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-009-9133-6