Identifying mislabeled training data with the aid of unlabeled data

作者：Donghai Guan, Weiwei Yuan, Young-Koo Lee, Sungyoung Lee

摘要

This paper presents a new approach for identifying and eliminating mislabeled training instances for supervised learning algorithms. The novelty of this approach lies in the using of unlabeled instances to aid the detection of mislabeled training instances. This is in contrast with existing methods which rely upon only the labeled training instances. Our approach is straightforward and can be applied to many existing noise detection methods with only marginal modifications on them as required. To assess the benefit of our approach, we choose two popular noise detection methods: majority filtering (MF) and consensus filtering (CF). MFAUD/CFAUD is the new proposed variant of MF/CF which relies on our approach and denotes majority/consensus filtering with the aid of unlabeled data. Empirical study validates the superiority of our approach and shows that MFAUD and CFAUD can significantly improve the performances of MF and CF under different noise ratios and labeled ratios. In addition, the improvement is more remarkable when the noise ratio is greater.

论文关键词：Supervised learning, Identifying mislabeled data, Unlabeled data, Majority filtering, Consensus filtering

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-010-0225-4