Hybrid sentiment analysis framework for a morphologically rich language

作者:Miljana Mladenović, Jelena Mitrović, Cvetana Krstev, Duško Vitas

摘要

This paper presents a process of building a Sentiment Analysis Framework for Serbian (SAFOS). We created a hybrid method that uses a sentiment lexicon and Serbian WordNet (SWN) synsets assigned with sentiment polarity scores in the process of feature selection. As the use of stemming for morphologically rich languages (MRLs) may result in loss or giving incorrect sentiment meaning to words, we decided to expand the sentiment lexicon, as well as the lexicon generated using SWN, by adding morphological forms of emotional terms and phrases. It was done using Serbian Morphological Electronic Dictionaries. A new feature reduction method for document-level sentiment polarity classification using maximum entropy modeling is proposed. It is based on mapping of a large number of related feature candidates (sentiment words, phrases and their inflectional forms) to a few concepts and using them as features. Testing was performed on a 10-fold cross validation set and on test sets containing news and movie reviews. The results of all experiments show that sentiment feature mapping for feature set reduction achieves better results over the basic set of features. For both test sets, the best classification accuracy scores were achieved for the combination of unigram and bigram features reduced by sentiment feature mapping (accuracy 78.3 % for movie reviews and 79.2 % for news test set). In 10-fold cross-validation, best average accuracy score of 95.6 % was obtained using unigrams as features, reduced by the mapping procedure.

论文关键词:Sentiment analysis, Machine learning, Maximum entropy classifier, Feature selection, Morphological e-dictionaries

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-015-0372-5