Generate pairwise constraints from unlabeled data for semi-supervised clustering

作者：

Highlights：

•

摘要

Pairwise constraint selection methods often rely on the label information of data to generate pairwise constraints. This paper proposes a new method of selecting pairwise constraints from unlabeled data for semi-supervised clustering to improve clustering accuracy. Given a dataset without any label information, it is first clustered by using the I-nice method into a set of initial clusters. From each initial cluster, a dense group of objects is obtained by removing the faraway objects. Then, the most informative object and the informative objects are identified with the local density estimation method in each dense group of objects. The identified objects are used to form a set of pairwise constraints, which are incorporated in the semi-supervised clustering algorithm to guide the clustering process toward a better solution. The advantage of this method is that no label information of data is required for selection pairwise constraints. Experimental results demonstrate that the new method improved the clustering accuracy and outperformed four state-of-the-art pairwise constraint selection methods, namely, random, FFQS, min–max, and NPU, on both synthetic and real-world datasets.

论文关键词：Constrained clustering,I-nice approach,Pairwise constraints selection,Semi-supervised clustering

论文评审过程：Received 9 June 2017, Revised 11 July 2019, Accepted 12 July 2019, Available online 15 July 2019, Version of Record 8 November 2019.

论文官网地址：https://doi.org/10.1016/j.datak.2019.101715