Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification

作者:

Highlights:

• Semi-supervised framework which exploits unsupervised approach (JST) is proposed.

• Self-training suffers from incorrectly labeling problem with insufficient data.

• Confidently predicted instances are labeled and used as training data by JST.

• Self-training with extended training corpus incrementally improves classifier model.

• Sentiment labeling improves accuracy by enriching initial classifier of self-training.

摘要

•Semi-supervised framework which exploits unsupervised approach (JST) is proposed.•Self-training suffers from incorrectly labeling problem with insufficient data.•Confidently predicted instances are labeled and used as training data by JST.•Self-training with extended training corpus incrementally improves classifier model.•Sentiment labeling improves accuracy by enriching initial classifier of self-training.

论文关键词:Concatenated vector,Paragraph vector,Self-training,Sentiment classification,Sentiment labeling,Topic model

论文评审过程:Received 15 December 2016, Revised 19 September 2017, Accepted 19 September 2017, Available online 20 September 2017, Version of Record 3 October 2017.

论文官网地址:https://doi.org/10.1016/j.elerap.2017.09.006