Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN

作者:

Highlights:

摘要

Density-based clustering algorithms have made a large impact on a wide range of application fields application. As more data are available with rising size and various internal organizations, non-parametric unsupervised procedures are becoming ever more important in understanding datasets. In this paper a new clustering algorithm S-DBSCAN1 is proposed in the context of knowledge discovery. S-DBSCAN belongs to the connectivity-based family such as DBSCAN but with noticeable differences and advantages as working in a differential mode. It is formalized via a very simple hierarchical process that hybridizes distance, -nearest and Density peaks concepts. It aims at partitioning existing data into clusters until no more clustering can be done. The information delivered allows the user to intuitively deduce different sets of natural partitions in clusters at different scales. S-DBSCAN scans the database in a ordered way by applying its algorithm core (S-DBSCANCORE) with judicious input parameters. Given a set of data patterns in some space, S-DBSCANCORE groups together data patterns that are closely packed together with respect to the differential density. Data patterns whose nearest neighbors have too different densities are detected and marked as borders while the others are not visited. S-DBSCAN embeds some intelligence that makes it self-tuning (almost fully automatic) and not dependent on a global density threshold as many existing algorithms. Tests were carried out using 2-dimensional benchmark datasets of various shapes and densities. They showed that S-DBSCAN was highly effective. It also proved efficient in high dimension space when natural clusters exist and much easier to use than competitive algorithms.

论文关键词:Clustering,Natural cluster,Distance,Density,Neighbors

论文评审过程:Received 29 July 2021, Revised 22 January 2022, Accepted 22 January 2022, Available online 2 February 2022, Version of Record 12 February 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108288