A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification

作者:

Highlights:

摘要

Owing to its openness, virtualization and sharing criterion, the Internet has been rapidly becoming a platform for people to express their opinion, attitude, feeling and emotion. As the subjectivity texts are often too many for people to go through, how to automatically classify them into different sentiment orientation categories (e.g. positive/negative) has become an important research problem. In this paper, based on Fisher’s discriminant ratio, an effective feature selection method is proposed for subjectivity text sentiment classification. In order to validate the proposed method, we compared it with the method based on Information Gain while Support Vector Machine is adopted as the classifier. Two experiments are conducted by combining different feature selection methods with two kinds of candidate feature sets. Under 2739 subjectivity documents of COAE2008s and 1006 car-related subjectivity documents, the experimental results indicate that the Fisher’s discriminant ratio based on word frequency estimation has the best performance respectively with accuracy 86.61% and 82.80% under two corpus while the candidate features are the words which appear in both positive and negative texts.

论文关键词:Fisher’s discriminant ratio,Feature selection,Text sentiment classification,Support vector machine

论文评审过程:Available online 3 February 2011.

论文官网地址:https://doi.org/10.1016/j.eswa.2011.01.077