HVLM: Exploring Human-Like Visual Cognition and Language-Memory Network for Visual Dialog

作者：

Highlights：

• A novel deep neural architecture HVLM is proposed for Visual Dialog.

• A dual-perspectives encoding mechanism is designed to understand an image comprehensively.

• An iterative learning strategy is designed to capture fine-grained semantic interactions in the dialog history.

• Experimental results demonstrate that our proposed model outperforms other comparable models by a significant margin on benchmark datasets.

摘要

•A novel deep neural architecture HVLM is proposed for Visual Dialog.•A dual-perspectives encoding mechanism is designed to understand an image comprehensively.•An iterative learning strategy is designed to capture fine-grained semantic interactions in the dialog history.•Experimental results demonstrate that our proposed model outperforms other comparable models by a significant margin on benchmark datasets.

论文关键词：Visual Dialog,Visual-language understanding,Dual-perspective reasoning,Simple spectral graph convolution network

论文评审过程：Received 18 January 2022, Revised 25 June 2022, Accepted 26 June 2022, Available online 18 July 2022, Version of Record 18 July 2022.

论文官网地址：https://doi.org/10.1016/j.ipm.2022.103008