Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism

摘要

Cross-modal retrieval aims to find the similarity between different modal data, while the hash retrieval method improves retrieval efficiency. This paper proposes a cross-modal hash retrieval method based on multiple similarity matrices. This paper proposes an unsupervised cross-modal hash retrieval method based on multiple similarity matrices. This paper uses a weighted combination method to construct fusion features through hash features and original features. Based on the three features, the auxiliary similarity matrix of each of the three features is established. Finally, the fusion matrix is constructed by a weighted combination of the similarity matrix of the original features and the hash features. These four different matrices include similarity matrices with varying forms of features and similarity matrices with varying construction methods, which concentrate the similarity information of other modalities. The loss function between different similarity matrices and the loss function between different modalities are calculated through these four different matrices. Considering that most models have a single method for extracting text features, this paper uses text self-attention to strengthen the effect of text feature extraction so that the final performance of this paper is effectively improved. In order to verify the impact of this article, the results are tested on the Wikipedia, MIRFlickr, and NUS-WIDE datasets, and the results prove that the effect of this article has certain advantages compared with the latest methods.