Iterative graph attention memory network for cross-modal retrieval

作者：

Highlights：

•

摘要

How to eliminate the semantic gap between multi-modal data and effectively fuse multi-modal data is the key problem of cross-modal retrieval. The abstractness of semantics makes semantic representation one-sided. In order to obtain complementary semantic information for samples with the same semantics, we construct a local graph for each instance and utilize a graph feature extractor (GFE) to reconstruct the sample representation based on the adjacency relationship between the sample itself and its neighbors. Owing to the problem that some cross-modal methods only focus on the learning of paired samples and cannot integrate more cross-modal information from the other modalities, we propose a cross-modal graph attention strategy to generate the graph attention representation for each sample from the local graph of its corresponding paired sample. In order to eliminate heterogeneous gap between modalities, we fuse the features of the two modalities using a recurrent gated memory network to choose prominent features from other modalities and filter out unimportant information to obtain a more discriminative feature representation in the common latent space. Experiments on four benchmark datasets demonstrate the superiority of our proposed model compared with state-of-the-art cross-modal methods.

论文关键词：Cross-modal retrieval,Graph convolutional network,Graph attention mechanism

论文评审过程：Received 30 December 2020, Revised 27 March 2021, Accepted 10 May 2021, Available online 13 May 2021, Version of Record 18 May 2021.

论文官网地址：https://doi.org/10.1016/j.knosys.2021.107138