Towards Safe Semi-supervised Classification: Adjusted Cluster Assumption via Clustering

作者:Yunyun Wang, Yan Meng, Zhenyong Fu, Hui Xue

摘要

Semi-supervised classification methods can perform even worse than the supervised counterparts in some cases. It undoubtedly reduces their confidence in real applications, and it is desired to improve the safety of semi-supervised classification such that it never performs worse than the supervised counterpart. Considering that the cluster assumption may not well reflect the real data distribution, which can be one possible cause of unsafe learning, we develop a safe semi-supervised support vector machine method in this paper by adjusting the cluster assumption (ACA-S3VM for short). Specifically, when samples from different classes are seriously overlapped, the real boundary actually lies not in the low density region, which will not be found by the cluster assumption. However, an unsupervised clustering method is able to detect the real boundary in this case. As a result, we design ACA-S3VM by adjusting the cluster assumption with the help of clustering, which considers the distances of individual unlabeled instances to the distribution boundary in learning. Empirical results show the competition of ACA-S3VM compared with the off-the-shelf safe semi-supervised classification methods.

论文关键词:Semi-supervised classification, Cluster assumption, Clustering, Decision boundary, Low density region

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-017-9607-5