Aligning vision-language for graph inference in visual dialog

作者:

Highlights:

• Visual dialog needs to construct semantic dependencies between visual and textual contents.

• The gap between different modalities should be shrinked by aligning the visual and textual knowledge.

• The application of graph structure is to connect isolated visual objects incorporated with textual semantics.

• The introduction of external visual relationships information is to comprehend the complex relationships more easily.

摘要

Highlights•Visual dialog needs to construct semantic dependencies between visual and textual contents.•The gap between different modalities should be shrinked by aligning the visual and textual knowledge.•The application of graph structure is to connect isolated visual objects incorporated with textual semantics.•The introduction of external visual relationships information is to comprehend the complex relationships more easily.

论文关键词:Visual dialog,Alignment,Graph inference,Scene graph

论文评审过程:Received 31 August 2020, Revised 25 September 2021, Accepted 29 September 2021, Available online 12 October 2021, Version of Record 26 October 2021.

论文官网地址:https://doi.org/10.1016/j.imavis.2021.104316