Towards building a high-quality microblog-specific Chinese sentiment lexicon

作者:

Highlights:

• An effective and efficient method to detect the popular use-invented new words in Chinese microblogs.

• Three kinds of heterogenous sentiment knowledge are extracted for building sentiment lexicon.

• A unified framework incorporating various kinds of sentiment knowledge for microblog-specific sentiment lexicon construction.

• Our microblog-specific sentiment lexicon outperforms existing sentiment lexicons.

摘要

Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically. The most important component of sentiment analysis systems is sentiment lexicon. However, the performance of traditional sentiment lexicons on microblog sentiment analysis is far from satisfactory, especially for Chinese. In this paper, we propose a data-driven approach to build a high-quality microblog-specific sentiment lexicon for Chinese microblog sentiment analysis system. The core of our method is a unified framework that incorporates three kinds of sentiment knowledge for sentiment lexicon construction, i.e., the word-sentiment knowledge extracted from microblogs with emoticons, the sentiment similarity knowledge extracted from words' associations among all the messages, and the prior sentiment knowledge extracted from existing sentiment lexicons. In addition, in order to improve the coverage of our sentiment lexicon, we propose an effective method to detect popular new words in microblogs, which considers not only words' distributions over texts, but also their distributions over users.The detected new words with strong sentiment are incorporated in our sentiment lexicon.We built a microblog-specific Chinese sentiment lexicon on a large microblog dataset with more than 17 million messages. Experimental results on two microblog sentiment datasets show that our microblog-specific sentiment lexicon can significantly improve the performance of microblog sentiment analysis.

论文关键词:Sentiment lexicon,Sentiment analysis,Microblog

论文评审过程:Received 18 December 2015, Revised 8 April 2016, Accepted 27 April 2016, Available online 4 May 2016, Version of Record 17 June 2016.

论文官网地址:https://doi.org/10.1016/j.dss.2016.04.007