Multi-scale relation reasoning for multi-modal Visual Question Answering

作者：

Highlights：

• Multiscale design to describe nature of VQA in involving multiple objects.

• Regional Attention to select informative question-related regions.

• Three proper designed stages for multimodal fusion among textual question and visual image.

摘要

•Multiscale design to describe nature of VQA in involving multiple objects.•Regional Attention to select informative question-related regions.•Three proper designed stages for multimodal fusion among textual question and visual image.

论文关键词：Multi-modal data,Visual Question Answering,Multi-scale relation reasoning,Attention model

论文评审过程：Received 23 August 2020, Revised 5 May 2021, Accepted 6 May 2021, Available online 14 May 2021, Version of Record 17 May 2021.

论文官网地址：https://doi.org/10.1016/j.image.2021.116319

原文链接
谷歌学术
必应学术
百度学术