Sentiment analysis using semantic similarity and Hadoop MapReduce

作者:Youness Madani, Mohammed Erritali, Jamaa Bengourram

摘要

Sentiment analysis or opinion mining is a domain that analyses people’s opinions, sentiments, evaluations, attitudes, and emotions from a written language; it had become a very active area of scientific research in recent years, especially with the development of social networks like Facebook and Twitter. In this paper we propose two new approaches to classify the tweets (look for the feeling expressed in the tweet), the first according to three classes : negative, positive or neutral, and the second according to two classes : negative or positive. Our first method consists in calculating the semantic similarity between the tweet to classify and three documents where each document represents a class (contains the words that represent a class); after the calculation of the similarity, the tweet takes the class of the document that has the greatest value of the semantic similarity with it. And the second method consists in calculating the semantic similarity between each word of the tweet to classify and the words “positive” and “negative” by proposing a new formula. We decide to do the analysis in a parallel and distributed way, using the Hadoop framework with the Hadoop distributed file system (HDFS) and the programming model MapReduce to solve the problem of the calculation time of the analysis if the dataset of the tweets is very large. The aim of our work is to combine between several domains, the information retrieval, semantic similarity, opinion mining or sentiment analysis and big data.

论文关键词:Opinion mining, Sentiment analysis, Semantic similarity, WordNet, Big data, Hadoop

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-018-1212-z