Fine-grained bidirectional attentional generation and knowledge-assisted networks for cross-modal retrieval

作者:

Highlights:

摘要

Generally, most existing cross-modal retrieval methods only consider global or local semantic embeddings, lacking fine-grained dependencies between objects. At the same time, it is usually ignored that the mutual transformation between modalities also facilitates the embedding of modalities. Given these problems, we propose a method called BiKA (Bidirectional Knowledge-assisted embedding and Attention-based generation). The model uses a bidirectional graph convolutional neural network to establish dependencies between objects. In addition, it employs a bidirectional attention-based generative network to achieve the mutual transformation between modalities. Specifically, the knowledge graph is used for local matching to constrain the local expression of the modalities, in which the generative network is used for mutual transformation to constrain the global expression of the modalities. In addition, we also propose a new position relation embedding network to embed position relation information between objects. The experiments on two public datasets show that the performance of our method has been dramatically improved compared to many state-of-the-art models.

论文关键词:Cross-modal retrieval,Graph convolutional network,Knowledge embedding,Cross-attention,Attentional generative network

论文评审过程:Received 31 March 2022, Accepted 6 June 2022, Available online 13 June 2022, Version of Record 20 June 2022.

论文官网地址:https://doi.org/10.1016/j.imavis.2022.104507