Image–text sentiment analysis via deep multimodal attentive fusion

作者:

Highlights:

摘要

Sentiment analysis of social media data is crucial to understand people’s position, attitude, and opinion toward a certain event, which has many applications such as election prediction and product evaluation. Though great effort has been devoted to the single modality (image or text), less effort is paid to the joint analysis of multimodal data in social media. Most of the existing methods for multimodal sentiment analysis simply combine different data modalities, which results in dissatisfying performance on sentiment classification. In this paper, we propose a novel image–text sentiment analysis model, i.e., Deep Multimodal Attentive Fusion (DMAF), to exploit the discriminative features and the internal correlation between visual and semantic contents with a mixed fusion framework for sentiment analysis. Specifically, to automatically focus on discriminative regions and important words which are most related to the sentiment, two separate unimodal attention models are proposed to learn effective emotion classifiers for visual and textual modality respectively. Then, an intermediate fusion-based multimodal attention model is proposed to exploit the internal correlation between visual and textual features for joint sentiment classification. Finally, a late fusion scheme is applied to combine the three attention models for sentiment prediction. Extensive experiments are conducted to demonstrate the effectiveness of our approach on both weakly labeled and manually labeled datasets.

论文关键词:Multimodal learning,Sentiment analysis,Attention model,Fusion

论文评审过程:Received 10 May 2018, Revised 4 January 2019, Accepted 12 January 2019, Available online 16 January 2019, Version of Record 4 February 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.01.019