Large margin classifiers to generate synthetic data for imbalanced datasets

作者:Marcelo Ladeira Marques, Saulo Moraes Villela, Carlos Cristiano Hasenclever Borges

摘要

In this paper we propose the development of an approach capable of improving the results obtained by classification algorithms when applied to imbalanced datasets. The method, called Incremental Synthetic Balancing Algorithm (ISBA), performs an iterative procedure based on large margin classifiers, aiming to generate synthetic samples in order to reduce the level of imbalance. In the process, we use the support vectors as the reference for the generation of new instances, allowing them to be positioned in regions with greater representativeness. Furthermore, the new samples can exceed the limits of the ones used for their generation, which enables extrapolation of the boundaries of the minority class, achieving more significant recognition of this class of interest. We present comparative experiments with other techniques, among them the SMOTE, which provide strong evidence of the applicability of the proposed approach.

论文关键词:Imbalanced learning, Large margin classifiers, Oversampling, Synthetic sample generation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-01719-y