STA-Net: spatial-temporal attention network for video salient object detection

作者:Hong-Bo Bi, Di Lu, Hui-Hui Zhu, Li-Na Yang, Hua-Ping Guan

摘要

This paper conducts a systematic study on the role of spatial and temporal attention mechanism in the video salient object detection (VSOD) task. We present a two-stage spatial-temporal attention network, named STA-Net, which makes two major contributions. In the first stage, we devise a Multi-Scale-Spatial-Attention (MSSA) module to reduce calculation cost on non-salient regions while exploiting multi-scale saliency information. Such a sliced attention method offers an individual way to efficiently exploit the high-level features of the network with an enlarged receptive field. The second stage is to propose a Pyramid-Saliency-Shift-Aware (PSSA) module, which puts emphasis on the importance of dynamic object information since it offers a valid shift cue to confirm salient object and capture temporal information. Such a temporal detection module is able to encourage precise salient region detection. Exhaustive experiments show that the proposed STA-Net is effective for video salient object detection task, and achieves compelling performance in comparison with state-of-the-art.

论文关键词:Multi-scale, Video salient object detection, Attention, Pyramid

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-01961-4