On the noise estimation statistics

作者:

摘要

Learning with noisy labels has attracted much attention during the past few decades. A fundamental problem is how to estimate noise proportions from corrupted data. Previous studies on this issue resort to the estimations of class distributions, conditional distributions, or the kernel embedding of distributions. In this paper, we present another simple and effective approach for noise estimation. The basic idea is to utilize the first- and second-order statistics of observed data, and the positive semi-definiteness of covariance matrices. Then, an upper bound on noise estimation is provided without additional assumptions over data distribution. Based on this idea and using the locality property of random noise, we develop the Noise Estimation Statistics with Clusters (NESC) method, which firstly clusters the corrupted data by k-means algorithm, and then makes noise estimation from clusters based on the first- and second-order statistics. We present the existence, uniqueness and convergence analysis of our noise estimation, and empirical studies verify the effectiveness of the NESC method.

论文关键词:Machine learning,Classification,Random noise,Noise estimation

论文评审过程:Received 11 December 2020, Accepted 4 January 2021, Available online 8 January 2021, Version of Record 13 January 2021.

论文官网地址:https://doi.org/10.1016/j.artint.2021.103451