Fuzzy C-means for english sentiment classification in a distributed system

作者:Vo Ngoc Phu, Nguyen Duy Dat, Vo Thi Ngoc Tran, Vo Thi Ngoc Chau, Tuan A. Nguyen

摘要

Sentiment classification plays a significant role in everyday life, in political activities, in activities relating to commodity production, and commercial activities. Finding a solution for the accurate and timely classification of emotions is a challenging task. In this research, we propose a new model for big data sentiment classification in the parallel network environment. Our proposed model uses the Fuzzy C-Means (FCM) method for English sentiment classification with Hadoop MAP (M) /REDUCE (R) in Cloudera. Cloudera is a parallel network environment. Our proposed model can classify the sentiments of millions of English documents in the parallel network environment. We tested our model using the testing data set (which comprised 25,000 English reviews, 12,500 being positive and 12,500 negative) and achieved 60.2 % accuracy. Our English training data set has 60,000 English sentences, comprising 30,000 positive English sentences and 30,000 negative English sentences.

论文关键词:Sentiment classification, English sentiment classification, Opinion mining, English document opinion mining, Fuzzy C-Means, FCM, Cloudera, Parallel environment, Parallel network, Parallel network environment, Distributed system

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-016-0858-z