Interpretable visual reasoning: A survey

作者：

Highlights：

•

摘要

Visual reasoning refers to the process of solving questions about visual information. At present, most visual reasoning models are mainly based on deep learning and end-to-end architecture. Although these models have achieved good performance, they are usually black boxes for users, and it is difficult to understand the basic rationales of the reasoning process. In recent years, the academic community has realized the importance of interpretability in visual reasoning and has developed a series of Interpretable Visual Reasoning (IVR) models. In this paper, we review these models. First, we have established a taxonomy based on four explanation forms of vision, text, graph and symbol used in current visual reasoning. Secondly, we explore the typical IVR models of each category and analyze their pros and cons. Thirdly, we elaborate on the current mainstream datasets about visual reasoning and VQA, and analyze how these datasets promote IVR research from different perspectives. Finally, we summarize the challenges for IVR and point out potential research directions.

论文关键词：Visual question answering,Visual reasoning,Interpretability,Datasets,Survey

论文评审过程：Received 26 April 2021, Accepted 29 April 2021, Available online 12 June 2021, Version of Record 24 June 2021.

论文官网地址：https://doi.org/10.1016/j.imavis.2021.104194