Single pass text classification by direct feature weighting

作者:Hassan H. Malik, Dmitriy Fradkin, Fabian Moerchen

摘要

The Feature Weighting Classifier (FWC) is an efficient multi-class classification algorithm for text data that uses Information Gain to directly estimate per-class feature weights in the classifier. This classifier requires only a single pass over the dataset to compute the feature frequencies per class, is easy to implement, and has memory usage that is linear in the number of features. Results of experiments performed on 128 binary and multi-class text and web datasets show that FWC’s performance is at least comparable to, and often better than that of Naive Bayes, TWCNB, Winnow, Balanced Winnow and linear SVM. On a large-scale web dataset with 12,294 classes and 135,973 training instances, FWC trained in 13 s and yielded comparable classification performance to a state of the art multi-class SVM implementation, which took over 15 min to train.

论文关键词:Text classification, Feature weighting, Linear classifiers, Information gain, Scalable learning

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-010-0317-9