Feature selection based on term frequency deviation rate for text classification

作者:Hongfang Zhou, Yiming Ma, Xiang Li

摘要

Feature selection is a technique to select a subset of the most relevant features for modeling training. In this paper, a new concept of TDR is firstly proposed to improve the classification accuracy. Then, a TDR-based algorithm for text classification is advanced. Finally, the extensive experiments are made on seven datasets (K1a, K1b, WAP, R52, R8, 20NewGroups, and Cade12) for two classifiers of Naive Bayes and Support Vector Machine. The experimental results indicate that the new approach can improve the classification accuracy by an average percent of 7.9%.

论文关键词:Text classification, Feature selection, Term frequency, Document frequency, Deviation ratio

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-01937-4