VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

作者:Alessio Bernardo, Emanuele Della Valle

摘要

The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of Smote and Borderline-Smote inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.

论文关键词:SML, Evolving data stream, Concept drift, Balancing, Data sketching

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10618-021-00786-0