Document-level multi-topic sentiment classification of Email data with BiLSTM and data augmentation

作者:

Highlights:

摘要

Email data has unique characteristics, involving multiple topics, lengthy replies, formal language, high variance in length, high duplication, anomalies, and indirect relationships that distinguish it from other social media data. In order to better model Email documents and to capture complex sentiment structures in the content, we develop a framework for document-level multi-topic sentiment classification of Email data. Note that, a large volume of labeled Email data is rarely publicly available. We introduce an optional data augmentation process to increase the size of datasets with synthetically labeled data to reduce the probability of overfitting and underfitting during the training process. To generate segments with topic embeddings and topic weighting vectors as inputs for our proposed model, we apply both latent Dirichlet allocation topic modeling and semantic text segmentation to post-process Email documents. Empirical results obtained with multiple sets of experiments, including performance comparison against various state-of-the-art algorithms with and without data augmentation and diverse parameter settings, are analyzed to demonstrate the effectiveness of our proposed framework.

论文关键词:Sentiment classification,Email sentiment,Multi-topic sentiment,Bidirectional LSTM,Data augmentation

论文评审过程:Received 13 December 2019, Revised 11 March 2020, Accepted 13 April 2020, Available online 18 April 2020, Version of Record 24 April 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.105918