Noise-robust oversampling for imbalanced data classification

作者:

Highlights:

• Propose three noise-robust mechanisms to address the noise generation problem in classic oversampling algorithms: adopting an advanced clustering algorithm, designing adaptive embedding to generate samples, and implementing a safe boundary to enlarge class boundaries.

• Propose the heterogeneous distance metric to better cluster mixed-type data along with dedicated approaches to avoid generating groundless samples with categorical variables.

• Adapted decomposition strategy extends solution for binary imbalanced data to the multi-class setting. Moreover, better placement of new samples are provided.

• Experiments on the standard datasets validate the effectiveness of the proposed data.

摘要

•Propose three noise-robust mechanisms to address the noise generation problem in classic oversampling algorithms: adopting an advanced clustering algorithm, designing adaptive embedding to generate samples, and implementing a safe boundary to enlarge class boundaries.•Propose the heterogeneous distance metric to better cluster mixed-type data along with dedicated approaches to avoid generating groundless samples with categorical variables.•Adapted decomposition strategy extends solution for binary imbalanced data to the multi-class setting. Moreover, better placement of new samples are provided.•Experiments on the standard datasets validate the effectiveness of the proposed data.

论文关键词:Imbalanced learning,Classification,Clustering

论文评审过程:Received 26 May 2021, Revised 13 August 2022, Accepted 27 August 2022, Available online 6 September 2022, Version of Record 16 September 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.109008