Local gap density for clustering high-dimensional data with varying densities

作者:

Highlights:

摘要

Density-based clustering algorithms are for clustering the data with arbitrary shapes. However, most of these algorithms face difficulties in handling the high-dimensional data with varying densities; especially, they cannot well discover the clusters in sparse regions. In this paper, we define a new type of density, local gap density, in the k-NN graph which works well for high-dimensional data. The local gap density of each point considers not only the number of all points in its nearest neighbor but also the average distance from this point to all points in this nearest neighbor. In this way, the core points in sparse regions in the sense of existing density-based clustering have high densities in our density definition, so they can be easily detected. By the core points, the potential cross-cluster edges in the k-NN graph can be well identified. After deleting these edges, we group all the points in each component with large cardinality as a subcluster, and then, similar to density peaks clustering, assign each remaining point to its corresponding existing subcluster. Extensive experiments on eight publicly available datasets demonstrate the effectiveness of our clustering algorithm.

论文关键词:Clustering,High-dimensional,Local gap density,Cross-cluster edge

论文评审过程:Received 28 September 2018, Revised 25 June 2019, Accepted 30 July 2019, Available online 12 August 2019, Version of Record 11 October 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.104905