Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm

作者:Xiao Li, Kewen Li

摘要

Adaptive Boosting (AdaBoost) algorithm is a widely used ensemble learning algorithm. And it can effectively improve the classification performance of ordinary datasets when combined with many other types of learning algorithms. The AdaBoost algorithm focuses on the overall classification performance of weak classifiers and aims to minimize the overall classification error. However, it ignores the imbalance in the number of samples between different classes, so it is not suitable for imbalanced data classification directly. In order to improve the classification accuracy of the minority samples in imbalanced datasets, this paper proposes an improved AdaBoost algorithm based on weight adjustment factors (AdaBoost-C). It redefines the error rate function by assigning a higher weight to the minority sample to emphasize its importance, and assigning a lower weight to the majority sample to suppress its importance. In addition, this paper also proposes an adaptive particle swarm optimization algorithm with exponential dynamic adjustment of inertia weight (EIWAPSO) to further optimize the weight of the weak classifier. It can effectively prevent the ensemble algorithm from generating redundant and useless weak classifiers to consume system resources, and avoid falling into local optimum. The experimental results show that the Recall and AUC values of the EIWAPSO-AdaBoost-C ensemble algorithm proposed in this paper have reached the highest values on datasets with different IR, and the maximum, minimum and average errors of this algorithm have reached the minimum values in a variety of comparison algorithms. Therefore, the algorithm proposed in this paper can not only effectively improve the classification accuracy of minority samples on imbalanced datasets, but also the algorithm is more stable.

论文关键词:Adaptive boosting, Imbalanced data, Particle swarm optimization, Inertia weight

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02708-5