Enhancing principal direction divisive clustering

作者:

Highlights:

摘要

While data clustering has a long history and a large amount of research has been devoted to the development of numerous clustering techniques, significant challenges still remain. One of the most important of them is associated with high data dimensionality. A particular class of clustering algorithms has been very successful in dealing with such datasets, utilising information driven by the principal component analysis. In this work, we try to deepen our understanding on what can be achieved by this kind of approaches. We attempt to theoretically discover the relationship between true clusters in the data and the distribution of their projection onto the principal components. Based on such findings, we propose appropriate criteria for the various steps involved in hierarchical divisive clustering and develop compilations of them into new algorithms. The proposed algorithms require minimal user-defined parameters and have the desirable feature of being able to provide approximations for the number of clusters present in the data. The experimental results indicate that the proposed techniques are effective in simulated as well as real data scenarios.

论文关键词:Clustering,Principal component analysis,Kernel density estimation

论文评审过程:Received 4 November 2009, Revised 16 May 2010, Accepted 19 May 2010, Available online 24 May 2010.

论文官网地址:https://doi.org/10.1016/j.patcog.2010.05.025