Multi-stream neural network fused with local information and global information for HOI detection

作者:Limin Xia, Rui Li

摘要

Human-Object Interaction (HOI) Detection is a new genre of human-centric visual relationship detection task, which is significant to deep understanding of visual scenes. Due to the complexity of the visual scene in the image, HOI detection is still a challenging task, the most critical part of which is feature extraction and representation. Some existing approaches rely solely on local region information for HOI detection without using global contextual information, but global contextual information contributes to this task in some HOI categories. Other approaches incorporate global contextual information for HOI detection while losing local region information. In this work, we propose a multi-stream neural network architecture composed of three special module that employs both local region information and global contextual information for HOI detection. This model can detect not only the HOI categories based on local region information but also on global contextual information. Our model more fully considers all HOI categories in the dataset. Compared with other existing approaches, the proposed model shows improved performance on V-COCO and HICO-DET benchmark datasets, especially when predicting rare HOI categories.

论文关键词:Human-object interactions, Global contextual information, Local region information, Information fusion

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-01794-1