Equalization ensemble for large scale highly imbalanced data classification

作者:

Highlights:

摘要

The class-imbalance problem has been widely distributed in various research fields. The larger the data scale and the higher the data imbalance, the more difficult the proper classification. For large-scale highly imbalanced data sets, the ensemble method based on under-sampling is one of the most competitive techniques among the existing techniques. However, it is susceptible to improperly sampling strategies, easy to lose the useful information of the majority class, and not easy to generalize the learning model. To overcome these limitations, we propose an equalization ensemble method (EASE) with two new schemes. First, we propose an equalization under-sampling scheme to generate a balanced data set for each base classifier, which can reduce the impact of class imbalance on the base classifiers; Second, we design a weighted integration scheme, where the G-mean scores obtained by base classifiers on the original imbalanced data set are used as the weights. These weights can not only make the better-performed base-classifiers dominate the final classification decision, but also adapt to a variety of imbalanced data sets with different scales while avoiding the occurrence of some extremely bad situations. Experimental results on three metrics show that EASE increases the diversity of base classifiers and outperforms twelve state-of-the-art methods on the imbalanced data sets with different scales.

论文关键词:Imbalanced data classification,Ensemble learning,Large-scale data,Under-sampling

论文评审过程:Received 23 September 2021, Revised 22 January 2022, Accepted 22 January 2022, Available online 31 January 2022, Version of Record 19 February 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108295