A novel semi-supervised learning framework with simultaneous text representing

作者:Yan Zhu, Jian Yu, Liping Jing

摘要

Text representation has received extensive attention in text mining tasks. There are various text representation models. Among them, vector space model is the most commonly used one. For vector space model, the core technique is term weighting. To date, a great deal of different term-weighting methods have been proposed, which can be divided into supervised group and unsupervised group. However, it is not advisable to use these two groups of methods directly in semi-supervised applications. In semi-supervised applications, the majority of the supervised term-weighting methods are not applicable as the label information is insufficient; meanwhile, the unsupervised term-weighting methods cannot make use of the provided category labels. Thus, a semi-supervised learning framework for iteratively revising the text representation by an EM-like strategy is proposed in this paper. Furthermore, a new supervised term-weighting method t f.sd f is proposed. T f.sd f has the ability to emphasize the importance of terms that are unevenly distributed among all the classes and weaken the importance of terms that are uniformly distributed. Experimental results on real text data show that the proposed semi-supervised learning framework with the aid of t f.sd f performs well. Also, t f.sd f is shown to be efficient for supervised learning.

论文关键词:Semi-supervised learning, Term weighting, Text representation, Classifier

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-012-0481-1