AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering

作者:

Highlights:

• An attention-based multimodal alignment model is proposed for medical VQA.

• Attentions focus on questions by using visual and textual content simultaneously.

• Composite loss aligns text-based and image-based attention to locate the keywords.

• We construct an enhanced dataset based on the VQARAD dataset to improve data quality.

摘要

•An attention-based multimodal alignment model is proposed for medical VQA.•Attentions focus on questions by using visual and textual content simultaneously.•Composite loss aligns text-based and image-based attention to locate the keywords.•We construct an enhanced dataset based on the VQARAD dataset to improve data quality.

论文关键词:Attention mechanism,Deep learning,Medical Visual Question Answering,Multimodal fusion,Medical images

论文评审过程:Received 24 December 2021, Revised 18 August 2022, Accepted 18 August 2022, Available online 27 August 2022, Version of Record 5 September 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109763