A weighted hybrid ensemble method for classifying imbalanced data

作者:

Highlights:

摘要

In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine learning, data imbalance can have adverse effects. At present, the methods to solve the problem of data imbalance can be divided into data-level methods, algorithm-level methods and hybrid methods. In this paper, we propose a weighted hybrid ensemble method for classifying imbalanced data in binary classification tasks, called WHMBoost. In the framework of the boosting algorithm, the presented method combines two data sampling methods and two base classifiers, and each sampling method and each base classifier is assigned corresponding weights, which makes them have better complementary advantages. The performance of WHMBoost has been evaluated on 40 benchmark imbalanced datasets with state of the art ensemble methods like AdaBoost, RUSBoost, SMOTEBoost using AUC, F-Measure and Geometric Mean as the performance evaluation criteria. Experimental results show significant improvement over the other methods and it can be concluded that WHMBoost is a promising and effective algorithm to deal with imbalance datasets.

论文关键词:Data imbalance,Binary classification,Boosting algorithm,Data sampling methods,Base classifiers

论文评审过程:Received 26 February 2020, Revised 6 May 2020, Accepted 28 May 2020, Available online 8 June 2020, Version of Record 10 June 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106087