A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities

作者:

Highlights:

摘要

It is important to find the natural clusters in high dimensional data where visualization becomes difficult. A natural cluster is a cluster of any shape and density, and it should not be restricted to a globular shape as a wide number of algorithms assume, or to a specific user-defined density as some density-based algorithms require.In this work, it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster. It is then possible to distinguish clusters based on their distance-based densities. A novel dynamic model is proposed based on new distance-relatedness measures and clustering criteria. The proposed algorithm “Mitosis” is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data. It has a good computational complexity compared to related algorithms. It performs very well on high dimensional data, discovering clusters that cannot be found by known algorithms. It also identifies outliers in the data as a by-product of the cluster formation process. A validity measure that depends on the main clustering criterion is also proposed to tune the algorithm's parameters. The theoretical bases of the algorithm and its steps are presented. Its performance is illustrated by comparing it with related algorithms on several data sets.

论文关键词:Clustering,Dynamic model,Arbitrary shaped clusters,Arbitrary density clusters,High dimensional data,Distance-relatedness

论文评审过程:Received 23 August 2007, Revised 26 July 2008, Accepted 29 August 2008, Available online 10 October 2008.

论文官网地址:https://doi.org/10.1016/j.patcog.2008.08.037