Dealing with class imbalance in classifier chains via random undersampling

作者:

Highlights:

摘要

Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.

论文关键词:Multi-label learning,Class imbalance,Classifier chains,Undersampling

论文评审过程:Received 4 June 2019, Revised 24 September 2019, Accepted 27 November 2019, Available online 4 December 2019, Version of Record 24 February 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.105292