Noise label learning through label confidence statistical inference

作者：

Highlights：

•

摘要

Noise label exists widely in real-world data, resulting in the degradation of classification performance. Popular methods require a known noise distribution or additional cleaning supervision, which is usually unavailable in practical scenarios. This paper presents a theoretical statistical method and designs a label confidence inference (LISR) algorithm to handle this issue. For data distribution, we define a statistical function for label inconsistency and analyze its relationship with neighbor radius. For data representation, we define trusted-neighbor, nearest-trusted-neighbor and untrusted-neighbor. For noisy label recognition, we present three inference methods to predict the labels and their confidence. The LISR algorithm establishes a practical statistical model, queries the initial trusted instances, iteratively searches for the trusted instances and corrects labels. We conducted experiments on synthetic, UCI and classic image datasets. The results of significance test verified the effectiveness of LISR and its superiority to the state-of-the-art noise label learning algorithms.

论文关键词：Confidence prediction,Noise label,Label inconsistency,Statistical inference

论文评审过程：Received 13 December 2020, Revised 29 April 2021, Accepted 12 June 2021, Available online 15 June 2021, Version of Record 17 June 2021.

论文官网地址：https://doi.org/10.1016/j.knosys.2021.107234