The added value of auxiliary data in sentiment analysis of Facebook posts

作者:

Highlights:

• We assess the added value of leading and lagging information in sentiment analysis.

• We analyze 17,697 Facebook status updates.

• We use two classification algorithms, five times twofold cross-validation and the Friedman test.

• Including leading and lagging data increases the AUC substantially.

• These findings clearly indicate that including leading and lagging data is a viable strategy.

摘要

The purpose of this study is to (1) assess the added value of information available before (i.e., leading) and after (i.e., lagging) the focal post's creation time in sentiment analysis of Facebook posts, (2) determine which predictors are most important, and (3) investigate the relationship between top predictors and sentiment. We build a sentiment prediction model, including leading information, lagging information, and traditional post variables. We benchmark Random Forest and Support Vector Machines using five times twofold cross-validation. The results indicate that both leading and lagging information increase the model's predictive performance. The most important predictors include the number of uppercase letters, the number of likes and the number of negative comments. A higher number of uppercase letters and likes increases the likelihood of a positive post, while a higher number of comments increases the likelihood of a negative post. The main contribution of this study is that it is the first to assess the added value of leading and lagging information in the context of sentiment analysis.

论文关键词:Facebook,Text mining,Sentiment analysis,Machine learning,Social media

论文评审过程:Received 10 September 2015, Revised 17 June 2016, Accepted 19 June 2016, Available online 27 June 2016, Version of Record 1 August 2016.

论文官网地址:https://doi.org/10.1016/j.dss.2016.06.013