Visual question answering model based on graph neural network and contextual attention

作者：

Highlights：

• Graph neural network model is used to capture the implicit relationship among semantic objects or significant-image regions.

• We used a visual context-aware attention model to choose salient visual information for answer prediction.

• Experimental results on VQA 1.0 and VQA 2.0 datasets demonstrates that our model performs much better than the SOTA models.

摘要

•Graph neural network model is used to capture the implicit relationship among semantic objects or significant-image regions.•We used a visual context-aware attention model to choose salient visual information for answer prediction.•Experimental results on VQA 1.0 and VQA 2.0 datasets demonstrates that our model performs much better than the SOTA models.

论文关键词：Visual question answering,Computer vision,Natural language processing,Attention

论文评审过程：Received 30 August 2020, Revised 10 January 2021, Accepted 26 March 2021, Available online 29 March 2021, Version of Record 5 April 2021.

论文官网地址：https://doi.org/10.1016/j.imavis.2021.104165