Two density-based sampling approaches for imbalanced and overlapping data

作者:

Highlights:

• Introducing two density-based methods to achieve balance, eliminate overlap and noise data.

• Increasing minority class learning quality without significantly harming majority class learning.

• Creating a boundary between the classes while trying to maintain their structures and shapes maximally.

• Introducing a density-based hybrid sampling method to achieve balance and create a uniform distribution of data in classes.

• Comprehensive evaluation of the proposed methods in compared to other recent related works on a variant set of imbalanced datasets.

摘要

•Introducing two density-based methods to achieve balance, eliminate overlap and noise data.•Increasing minority class learning quality without significantly harming majority class learning.•Creating a boundary between the classes while trying to maintain their structures and shapes maximally.•Introducing a density-based hybrid sampling method to achieve balance and create a uniform distribution of data in classes.•Comprehensive evaluation of the proposed methods in compared to other recent related works on a variant set of imbalanced datasets.

论文关键词:Imbalanced dataset,Density,Undersampling,Oversampling,Overlapping

论文评审过程:Received 6 August 2021, Revised 13 December 2021, Accepted 11 January 2022, Available online 29 January 2022, Version of Record 9 February 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108217