Multi-scale relation reasoning for multi-modal Visual Question Answering

作者:

Highlights:

• Multiscale design to describe nature of VQA in involving multiple objects.

• Regional Attention to select informative question-related regions.

• Three proper designed stages for multimodal fusion among textual question and visual image.

摘要

•Multiscale design to describe nature of VQA in involving multiple objects.•Regional Attention to select informative question-related regions.•Three proper designed stages for multimodal fusion among textual question and visual image.

论文关键词:Multi-modal data,Visual Question Answering,Multi-scale relation reasoning,Attention model

论文评审过程:Received 23 August 2020, Revised 5 May 2021, Accepted 6 May 2021, Available online 14 May 2021, Version of Record 17 May 2021.

论文官网地址:https://doi.org/10.1016/j.image.2021.116319