Sentiment analysis in financial texts

作者:

Highlights:

• To explain a classifier-based sentiment parser for financial texts

• To demonstrate how to assign the polarity of phrases using an assessment heuristic

• To provide statistical tests using twelve million words to attest its significance

摘要

The growth of financial texts in the wake of big data has challenged most organizations and brought escalating demands for analysis tools. In general, text streams are more challenging to handle than numeric data streams. Text streams are unstructured by nature, but they represent collective expressions that are of value in any financial decision. It can be both daunting and necessary to make sense of unstructured textual data. In this study, we address key questions related to the explosion of interest in how to extract insight from unstructured data and how to determine if such insight provides any hints concerning the trends of financial markets. A sentiment analysis engine (SAE) is proposed which takes advantage of linguistic analyses based on grammars. This engine extends sentiment analysis not only at the word token level, but also at the phrase level within each sentence. An assessment heuristic is applied to extract the collective expressions shown in the texts. Also, three evaluations are presented to assess the performance of the engine. First, several standard parsing evaluation metrics are applied on two treebanks. Second, a benchmark evaluation using a dataset of English movie review is conducted. Results show our SAE outperforms the traditional bag of words approach. Third, a financial text stream with twelve million words that aligns with a stock market index is examined. The evaluation results and their statistical significance provide strong evidence of a long persistence in the mood time series generated by the engine. In addition, our approach establishes grounds for belief that the sentiments expressed through text streams are helpful for analyzing the trends in a stock market index, although such sentiments and market indices are normally considered to be completely uncorrelated.

论文关键词:Text analysis,Financial time series,Decision support systems

论文评审过程:Received 23 September 2015, Revised 20 September 2016, Accepted 31 October 2016, Available online 5 November 2016, Version of Record 24 January 2017.

论文官网地址:https://doi.org/10.1016/j.dss.2016.10.006