A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor

作者:

Highlights:

摘要

Self-training method is one of the relatively successful methodologies of semi-supervised classification. It can exploit both labeled data and unlabeled data to train a satisfactory supervised classifier. Mislabeling is one of the largest challenges in the self-training method and the most common technique for removing mislabeled samples is the local noise filter. However, existing local noise filters used in self-training methods confront following technical defects: parameter dependence and using only labeled data to remove mislabeled samples. To address these shortcomings, this paper proposes a novel self-training method based on density peaks and an extended parameter-free local noise filter (STDPNF). In STDPNF, the self-training method based on density peaks is redesigned to be more suitable for combination with local noise filters. Moreover, a new local noise filter based on natural neighbors is proposed to filter out mislabeled instances. Compared with existing local noise filters used in self-training methods, the one in STDPNF is parameter-free and can remove mislabeled samples by exploiting the information of both labeled data and unlabeled data. We focus on k nearest neighbor as a base classifier. In experiments, we verify the efficiency of STDPNF in improving the performance of the base classifier of k nearest neighbor and the advantage of STDPNF in having the ability to remove mislabeled instances efficiently even when labeled data are not sufficient.

论文关键词:Self-training method,Semi-supervised classification,Density peaks,Noise filters,Natural neighbors,Semi-supervised learning

论文评审过程:Received 24 March 2019, Revised 25 July 2019, Accepted 28 July 2019, Available online 31 July 2019, Version of Record 11 October 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.104895