Cross-ratio uninorms as an effective aggregation mechanism in sentiment analysis

作者:

Highlights:

摘要

There are situations in which lexicon-based methods for Sentiment Analysis (SA) are not able to generate a classification output for specific instances of a given dataset. Most often, the reason for this situation is the absence of specific terms in the sentiment lexicon required in the classification effort. In such cases, there were only two possible paths to follow: (1) add terms to the lexicon (off-line process) by human intervention to guarantee no noise is introduced into the lexicon, which prevents the classification system to provide an immediate answer; or (2) use the services of a word-frequency dictionary (on-line process), which is computationally costly to build. This paper investigates an alternative approach to compensate for the lack of ability of a lexicon-based method to produce a classification output. The method is based on the combination of the classification outputs of non lexicon-based tools. Specifically, firstly the outcome values of applying two or more non-lexicon classification methods are obtained. Secondly, these non-lexicon outcomes are fused using a uninorm based approach, which has been proved to have desirable compensation properties as required in the SA context, to generate the classification output the lexicon based approach is unable to achieve. Experimental results based on the execution of two well-known supervised machine learning algorithms, namely Naïve Bayes and Maximum Entropy, and the application of a cross-ratio uninorm operator are presented. Performance indices associated to options (1) and (2) above are compared against the results obtained using the proposed approach for two different datasets. Additionally, the performance of the proposed cross-ratio uninorm operator based approach is also compared when the aggregation operator used is the arithmetic mean instead. It is shown that the combination of non lexicon-based classification methods with specific uninorm operators improves the classification performance of lexicon-based methods, and it enables the offering of an alternative solution to the SA classification problem when needed. The proposed aggregation method could be used as well as a replacement of ensemble averaging techniques commonly applied when combining the results of several machine learning classifiers’ outputs.

论文关键词:Cross-ratio uninorms,Semantic orientation aggregation,Hybrid sentiment analysis,Supervised machine learning,Naïve Bayes,Maximum entropy

论文评审过程:Received 9 December 2016, Revised 24 February 2017, Accepted 25 February 2017, Available online 28 February 2017, Version of Record 10 April 2017.

论文官网地址:https://doi.org/10.1016/j.knosys.2017.02.028