Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text

作者:Afraz Z. Syed, Muhammad Aslam, Ana Maria Martinez-Enriquez

摘要

This paper presents, a grammatically motivated, sentiment classification model, applied on a morphologically rich language: Urdu. The morphological complexity and flexibility in grammatical rules of this language require an improved or altogether different approach. We emphasize on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. For our system, we develop sentiment-annotated lexicon of Urdu words. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. For the evaluation of the system, two corpora of reviews, from the domains of movies and electronic appliances are collected. The results of the experimentation show that, we achieve the state of the art performance in the sentiment analysis of the Urdu text.

论文关键词:Natural language processing, Sentiment analysis, Opinion mining, Shallow parsing, Dependency parsing

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-012-9322-6