Scalable semantic-enhanced supervised hashing for cross-modal retrieval

摘要

Cross-modal retrieval is a classic scenario that can describe the same semantics from multiple angles and provide us with more abundant and diversified information ; thus, it has abundant applications in information retrieval, data mining, and machine learning. Generally, supervised information can enhance the ability to learn hash features, and obtain better retrieval accuracy than unsupervised methods. However, how to effectively solve the discrete binary constraint problem, and how to retain more semantic information in the process of hash learning, which constitute important parts of cross-modal retrieval, still have some challenges. To mitigate these problems, in this work, we propose a novel cross-modal hashing approach, called, scalable pairwise embedding constraint hashing (SPECH). The SPECH approach employs the loss of likelihood similarity technique using pairwise sample data to measure the semantic similarity of heterogeneous modal samples. In light of this, the SPECH approach can maximize the utilization of available data from heterogeneous modal samples for training models. In addition, we further enhance the discriminative capability of hash codes and reduce the intraclass differences in heterogeneous modal samples, by embedding the more discriminative semantic label attribute space regression into the to-be-learned process of hash codes. Comprehensive experiments on three benchmark datasets show that the proposed SPECH approach achieves significantly better retrieval accuracy and outperforms several state-of-the-art approaches on cross-modal retrieval applications.