External validity of sentiment mining reports: Can current methods identify demographic biases, event biases, and manipulation of reviews?

作者:

Highlights:

• Sentiment mining reports are useless when their external validity cannot be assessed.

• Demographics, events, manipulation are main threats of sentiment mining external validity.

• This article gives meta-requirements and meta-designs of an external validity identifier.

• Automatic demographic, event and manipulation detection in sentiment reports is feasible.

• Sentiment mining services need to be complimented by external validity reports.

摘要

Many publications in sentiment mining provide new techniques for improved accuracy in extracting features and corresponding sentiments in texts. For the external validity of these sentiment reports, i.e., the applicability of the results to target audiences, it is important to well analyze data of the context of user-generated content and their sample of authors. The literature lacks an analysis of external validity of sentiment mining reports and the sentiment mining field lacks an operationalization of external validity dimensions toward practically useful techniques. From a kernel theory, we identify multiple threats to sentiment mining external validity and study three of them empirically 1) a mismatch in demographics of the reviewers sample, 2) bias due to reviewers' incidental experiences, and 3) manipulation of reviews. The value of external validity threat identifying techniques is next examined in cases from Goodread.com. We conclude that demographic biases can be well detected by current techniques, although we have doubts regarding stylometric techniques for this purpose. We demonstrate the usefulness of event and manipulation bias detection techniques in our cases, but this result needs further replications in more complex and more competitive contexts. Finally, for increasing the decisional usefulness of sentiment mining reports, they should be accompanied by external validity reports and software and service providers in this field should incorporate these in their offerings.

论文关键词:Sentiment mining,Opinion mining,External validity,Demographic bias,Event bias,Product review manipulation,Design proposition validation

论文评审过程:Received 5 June 2013, Revised 22 November 2013, Accepted 16 December 2013, Available online 24 December 2013.

论文官网地址:https://doi.org/10.1016/j.dss.2013.12.005