Fixed-Size Objects Encoding for Visual Relationship Detection

作者：Hengyue Pan, Xin Niu, Siqi Shen, Yixin Chen, Peng Qiao, Zhen Huang, Dongsheng Li

摘要

In this paper, we propose a fixed-size object encoding method called FOE-VRD to improve performance of visual relationship detection tasks. For each relationship triplet in a given image, FOE-VRD not only considers the subject and object, but also uses one fixed-size vector to encoding all background objects of the image. In this way, we introduce more background knowledge to assist the relationship detector for better performance. We firstly use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet, we apply ROI-pooling as the feature generator on the bounding boxes of subject and object to get two corresponding feature vectors. Moreover, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 generated feature vectors, we successfully encode the relationship using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get the predicate classification result. Experimental results on VRD and Visual Genome databases show that the proposed method works well on both predicate classification and relationship detection tasks, especially on the situation of zero-shot detection.

论文关键词：Objects encoding, Visual relationship detection, Deep learning

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11063-022-10766-0