AFE-MERT: imbalanced text classification with abstract feature extraction

作者:Murat Okkalioglu, Burcu Demirelli Okkalioglu

摘要

The class imbalance problem occurs when the distribution among classes is not balanced. This can be a problem that causes classifier models to bias toward classes with many training samples. The class imbalance problem is inherent in text classification. The abstract feature extraction method is a versatile term weighting scheme. It serves not only as a feature extractor to form a structural form from unorganized text data but also as a dimension reduction technique and classifier. In this study, we tackle the problem of class imbalance in abstract feature extraction. The proposed method utilizes relative imbalance ratio as a factor to elevate the representation of minority classes. Besides, we also integrate relevant term factors to boost the general accuracy. Experiments conducted with three different data sets, one of which is collected for this study, show that the original abstract feature extraction method indeed suffers from the class imbalance problem and the proposed methods demonstrate significant improvements in terms of f1-micro, f1-macro, and Matthew’s correlation coefficient. The experimental results also suggest that the proposed method is a competitive classifier and term weighting scheme when compared to the well-known classifiers (KNN, SVM, and Nearest Centroid) and term weighting schemes (TF-IDF, TF-ICF, TF-ICSDF, TF-RF, TF-PROB, TF-IGM, and TF-MONO).

论文关键词:Class imbalance, Abstract feature extraction, Relative imbalance ratio, Relevant term, Term weighting

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02983-2