On the Design of Robust Linear Pattern Classifiers Based on \(M\)-Estimators

摘要

Classical linear neural network architectures, such as the optimal linear associative memory (OLAM) Kohonen and Ruohonen (IEEE Trans Comp 22(7):701–702, 1973) and the adaptive linear element (Adaline) Widrow (IEEE Signal Process Mag 22(1):100–106, 2005; Widrow and Winter (IEEE Comp 21(3):25–39, 1988), are commonly used either as a standalone pattern classifier for linearly separable problems or as a fundamental building block of multilayer nonlinear classifiers, such as the multilayer perceptron (MLP), the radial basis functions networks (RBFN), the extreme learning machine (ELM) (Int J Mach Learn Cyber 2:107–122, 2011) and the echo-state network (ESN) Emmerich (Proceedings of the 20th international conference on artificial neural networks, 148–153, 2010). A common feature shared by the learning equations of OLAM and Adaline, respectively, the ordinary least squares (OLS) and the least mean squares (LMS) algorithms, is that they are optimal only under the assumption of gaussianity of the errors. However, the presence of outliers in the data causes the error distribution to depart from gaussianity and hence the classifier performance deteriorates. Bearing this in mind, in this paper we develop simple and efficient extensions of OLAM and Adaline, named Robust OLAM (ROLAM) and Robust Adaline (Radaline), which are robust to labeling errors (a.k.a. label noise), a type of outlier that often occur in classification tasks. This type of outlier usually results from mistakes during labelling the data points (e.g. misjudgement of a specialist) or from typing errors during creation of data files (e.g. by striking an incorrect key on a keyboard). To deal with such outliers, the ROLAM and the Radaline use \(M\)-estimators to compute the weights of the OLAM and Adaline networks, instead of using standard OLS/LMS algorithms. By means of comprehensive computer simulations using synthetic and real-world data sets, we show that the proposed robust linear classifiers consistently outperforms their original versions.