Adaptive semi-supervised learning on labeled and unlabeled data with different distributions

作者:Akinori Fujino, Naonori Ueda, Masaaki Nagata

摘要

Developing methods for designing good classifiers from labeled samples whose distribution is different from that of test samples is an important and challenging research issue in the fields of machine learning and its application. This paper focuses on designing semi-supervised classifiers with a high generalization ability by using unlabeled samples drawn by the same distribution as the test samples and presents a semi-supervised learning method based on a hybrid discriminative and generative model. Although JESS-CM is one of the most successful semi-supervised classifier design frameworks based on a hybrid approach, it has an overfitting problem in the task setting that we consider in this paper. We propose an objective function that utilizes both labeled and unlabeled samples for the discriminative training of hybrid classifiers and then expect the objective function to mitigate the overfitting problem. We show the effect of the objective function by theoretical analysis and empirical evaluation. Our experimental results for text classification using four typical benchmark test collections confirmed that with our task setting in most cases, the proposed method outperformed the JESS-CM framework. We also confirmed experimentally that the proposed method was useful for obtaining better performance when classifying data samples into either known or unknown classes, which were included in given labeled samples or not, respectively.

论文关键词:Semi-supervised classifier, Hybrid discriminative and generative model, Transfer learning, Text classification

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-012-0576-8