MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data

作者:Bin Li, Xiaofeng Gong, Chen Wang, Ruijuan Wu, Tong Bian, Yanming Li, Zhiyuan Wang, Ruisen Luo

摘要

Imbalanced data classification problem is widely existed in commercial activities and social production. It refers to the scenarios with considerable gap of sample amount among classes, thus significantly deteriorating the performance of the traditional classification algorithms. The previous dealing methods often focus on resampling and algorithm adjustment, but ignore enhancing the ability of feature learning. In this study, we have proposed a novel algorithm for imbalanced data classification: Maximum Mean Discrepancy-Encouraging Convolutional Autoencoder (MMD-CAE), from the perspective of feature learning. The algorithm adopts a two-phase target training process. The cross entropy loss is employed to calculate reconstruction loss of data, and the Maximum Mean Discrepancy (MMD) with intra-variance constraint is used to stimulate the feature discrepancy in bottleneck layer. By encouraging maximization of MMD between two-class samples, and mapping the original space to a higher dimension space via kernel skills, the features can be learned to form a more effective feature space. The proposed algorithm is tested on ten groups of samples with different imbalance ratios. The performance metrics of recall rate, F1 score, G-means and AUC verify that the proposed algorithm surpasses the existing state-of-the-art methods in this field, also with stronger generalization ability. This study could shed new lights on the related studies in terms of constituting more effective feature space via the proposed MMD with intra-variance constraint method, and the holistic MMD-CAE algorithm for imbalanced data classification.

论文关键词:Imbalanced data, Autoencoder, Two-phase target training, Maximum mean discrepancy, Inter-class distance

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02235-3