Text classification using ESC-based stochastic decision lists

作者:

Highlights:

摘要

We propose a new method of text classification using stochastic decision lists. A stochastic decision list is an ordered sequence of IF-THEN-ELSE rules, and our method can be viewed as a rule-based method for text classification having advantages of readability and refinability of acquired knowledge. Our method is unique in that decision lists are automatically constructed on the basis of the principle of minimizing extended stochastic complexity (ESC), and with it we are able to construct decision lists that have fewer errors in classification. The accuracy of classification achieved with our method appears better than or comparable to those of existing rule-based methods. We have empirically demonstrated that rule-based methods like ours result in high classification accuracy when the categories to which texts are to be assigned are relatively specific ones and when the texts tend to be short. We have also empirically verified the advantages of rule-based methods over non-rule-based ones.

论文关键词:Text classification,Statistical learning,Stochastic decision list,Rule-based method,Extended stochastic complexity

论文评审过程:Received 20 September 2000, Accepted 17 May 2001, Available online 3 January 2002.

论文官网地址:https://doi.org/10.1016/S0306-4573(01)00038-3