Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise

作者:Weining Zhang, Dong Wang, Xiaoyang Tan

摘要

We present a simple but effective method for data cleaning and classification in the presence of label noise. The fundamental idea is to treat the data points with label noise as outliers of the class indicated by the corresponding noisy label. This essentially allows us to deal with the traditional supervised problem of classification with label noise as an unsupervised one, i.e., identifying outliers from each class. However, finding such dubious observations (outliers) from each class is challenging in general. We therefore propose to reduce their potential influence using class-specific feature learning by autoencoder. Particularly, we learn for each class a feature space using all the samples labeled as that class, including those with noisy (but unknown to us) labels. Furthermore, in order to solve the situation when the noise is relatively high, we propose a weighted class-specific autoencoder by considering the effect of each data point on the postulated model. To fully exploit the advantage of the learned class-specific feature space, we use a minimum reconstruction error based method for finding out the outliers (label noise) and solving the classification task. Experiments on several datasets show that the proposed method achieves state of the art performance on the task of data cleaning and classification with noisy labels.

论文关键词:Class-specific autoencoder, Label noise, Classification, Data cleaning, Outliers

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-018-9963-9