A novel spatiotemporal attention enhanced discriminative network for video salient object detection

作者:Bing Liu, Kezhou Mu, Mingzhu Xu, Fangyuan Wang, Lei Feng

摘要

In contrast to image salient object detection, on which many achievements have been made, video salient object detection remains a considerable challenge. Not all features are useful in salient object detection, and some even cause interferences. In this paper, we propose a novel multiscale spatiotemporal ConvLSTM model based on an attention mechanism, which introduces space-based and channel-based attention mechanisms and improves the network’s capability to extract high-level semantic information and low-level spatial structural features. First, to obtain more effective spatiotemporal information, a ConvLSTM module embedded with an attention mechanism (CSAtt-ConvLSTM) is designed at higher layers of the network to weight salient features of the extracted spatiotemporal consistency. Second, a multiscale attention (MSA) module for distinguishing features is designed, which introduces two attention mechanisms: channel-wise attention (CA) units and spatial-wise attention (SA) units. The CA and SA units are used after high-level feature mapping obtained by the CSAtt-ConvLSTM module and shallow feature mapping, respectively, and then their outputs are fused as final output feature maps. A large number of experiments on multiple datasets verified the effectiveness of our proposed model, which reached a real-time speed on a single GPU of 20 fps.

论文关键词:Video salient object detection, Attention mechanism, Multiscale, CSAtt-ConvLSTM

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02649-z