Saliency prediction on omnidirectional images with attention-aware feature fusion network

摘要

Recent years have witnessed rapid development of deep learning technology and its successful application in the saliency prediction of traditional 2D images. However, when using deep neural network (DNN) models to perform saliency prediction on omnidirectional images (ODIs), there are two critical issues: (1) The datasets for ODIs are small-scale that cannot support the training DNN-based models. (2) It is challenging to perform saliency prediction in that some ODIs contain complex background clutters. In order to solve these two problems, we propose a novel Attention-Aware Features Fusion Network (AAFFN) model which is first trained with traditional 2D images and then transferred to the ODIs for saliency prediction. Specifically, our proposed AAFFN model consists of three modules: a Part-guided Attention (PA) module, a Visibility Score (VS) module, and a Attention-Aware Features Fusion (AAFF) module. The PA module is used to extract precise features to estimate attention of the finer part on ODIs, and eliminate the influence of cluttered background. Meanwhile, the VS module is introduced to measure the proportion of the foreground and background parts and generate visibility scores in the feature learning process. Finally, in the AAFF module, we utilize the weighted fusion of attention maps and visibility scores to generate the final saliency map. Extensive experiments and ablation analysis demonstrate that the proposed model achieves superior performance and outperforms other state-of-the-art methods on public benchmark datasets.