Efficient implementation of associative classifiers for document classification

作者:

Highlights:

摘要

In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification accuracy, and excellent interpretation. However, associative classifiers also have some obstacles to overcome when they are applied in the area of text classification. The target text collection generally has a very high dimension, thus the training process might take a very long time. We propose a feature selection based on the mutual information between the word and class variables to reduce the space dimension of the associative classifiers. In addition, the training process of the associative classifier produces a huge amount of classification rules, which makes the prediction with a new document ineffective. We resolve this by introducing a new efficient method for storing and pruning classification rules. This method can also be used when predicting a test document. Experimental results using the 20-newsgroups dataset show many benefits of the associative classification in both training and predicting when applied to a real world problem.

论文关键词:Text classification,Associative classifier,Feature selection,Rule pruning,Subset expansion

论文评审过程:Received 25 May 2006, Accepted 25 July 2006, Available online 12 October 2006.

论文官网地址:https://doi.org/10.1016/j.ipm.2006.07.012