Dynamic classifier ensemble for positive unlabeled text stream classification
作者:Shirui Pan, Yang Zhang, Xue Li
摘要
Most of studies on streaming data classification are based on the assumption that data can be fully labeled. However, in real-life applications, it is impractical and time-consuming to manually label the entire stream for training. It is very common that only a small part of positive data and a large amount of unlabeled data are available in data stream environments. In this case, applying the traditional streaming algorithms with straightforward adaptation to positive unlabeled stream may not work well or lead to poor performance. In this paper, we propose a Dynamic Classifier Ensemble method for Positive and Unlabeled text stream (DCEPU) classification scenarios. We address the problem of classifying positive and unlabeled text stream with various concept drift by constructing an appropriate validation set and designing a novel dynamic weighting scheme in the classification phase. Experimental results on benchmark dataset RCV1-v2 demonstrate that the proposed method DCEPU outperforms the existing LELC (Li et al. 2009b), DVS (with necessary adaption) (Tsymbal et al. in Inf Fusion 9(1):56–68, 2008), and Stacking style ensemble-based algorithm (Zhang et al. 2008b).
论文关键词:Positive unlabeled learning, Text streams, Classifier ensemble, Concept drift
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10115-011-0469-2