Robust ensemble learning for mining noisy data streams

作者:

Highlights:

摘要

In this paper, we study the problem of learning from concept drifting data streams with noise, where samples in a data stream may be mislabeled or contain erroneous values. Our essential goal is to build a robust prediction model from noisy stream data to accurately predict future samples. For noisy data sources, most existing works rely on data preprocessing techniques to cleanse noisy samples before the training of decision models. In data stream environments, these data preprocessing techniques are, unfortunately, hard to apply, mainly because the concept drifting in a data stream may make it very difficult to differentiate noise from samples of changing concepts. Accordingly, we propose an aggregate ensemble (AE) learning framework. The aim of AE is to build a robust ensemble model that can tolerate data errors. Theoretical and empirical studies on both synthetic and real-world data streams demonstrate that the proposed AE learning framework is capable of building accurate classification models from noisy data streams.

论文关键词:Data stream,Classification,Ensemble learning,Noise,Concept drifting

论文评审过程:Received 1 August 2009, Revised 9 October 2010, Accepted 1 November 2010, Available online 5 November 2010.

论文官网地址:https://doi.org/10.1016/j.dss.2010.11.004