SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling

作者:Hongjiao Guan, Yingtao Zhang, Min Xian, H. D. Cheng, Xianglong Tang

摘要

Many practical applications suffer from imbalanced data classification, in which case the minority class has degraded recognition rate. The primary causes are the sample scarcity of the minority class and the intrinsic complex distribution characteristics of imbalanced datasets. The imbalanced classification problem is more serious on small sample datasets. To solve the problems of small sample and class imbalance, a hybrid resampling method is proposed. The proposed method combines an oversampling approach (synthetic minority oversampling technique, SMOTE) and a novel data cleaning approach (weighted edited nearest neighbor rule, WENN). First, SMOTE generates synthetic minority class examples using linear interpolation. Then, WENN detects and deletes unsafe majority and minority class examples using weighted distance function and k-nearest neighbor (kNN) rule. The weighted distance function scales up a commonly used distance by considering local imbalance and spacial sparsity. Extensive experiments over synthetic and real datasets validate the superiority of the proposed SMOTE-WENN compared with three state-of-the-art resampling methods.

论文关键词:Imbalanced data classification, Small sample datasets, Oversampling, Data cleaning

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-01852-8