A new rule-based knowledge extraction approach for imbalanced datasets

作者：Aouatef Mahani, Ahmed Riadh Baba-Ali

摘要

Classification consists of extracting a classifier from large datasets. A dataset is imbalanced if it contains more instances in one class compared to the others. An imbalanced dataset contains majority instances and minority ones. It is worth noting that classical learning algorithms have a bias toward majority instances. If classification is applied to imbalanced datasets, it is called partial classification. Its approaches are generally based on sampling methods or algorithmic methods. In this paper, we propose a new hybrid approach using a three-phase-rule-based extraction process. Initially, the first classifier is extracted; it contains classification rules representing only majority instances. Then, we delete the majority instances, which are well classified by these rules, to produce a balanced dataset. The deleted majority instances are replaced by the extracted classification rules, which prevent any information loss. Subsequently, our algorithm is applied to the obtained balanced dataset to produce the second classifier which contains rules that represent both majority and minority instances. Finally, we add the rules of the first classifier to the second classifier to obtain the final classifier, which will be post-processed. Our approach has been tested on several imbalanced binary datasets. The obtained results show its efficiency compared to other results.

论文关键词：Classification, Class imbalance problem, Data mining, Genetic algorithms, Imbalanced datasets sampling

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10115-019-01330-9