Text classification from unlabeled documents with bootstrapping and feature projection techniques

作者:

Highlights:

摘要

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.

论文关键词:Text classification,Bootstrapping,Feature projection,Unlabeled data,Text classifier

论文评审过程:Received 16 July 2007, Revised 11 July 2008, Accepted 16 July 2008, Available online 11 September 2008.

论文官网地址:https://doi.org/10.1016/j.ipm.2008.07.004