SiamAtt: Siamese attention network for visual tracking

作者：

Highlights：

•

摘要

Visual attention has recently achieved great success and wide application in deep neural networks. Existing methods based on Siamese network have achieved a good accuracy–efficiency trade-off in visual tracking. However, the training time of Siamese trackers becomes longer for the deeper network and larger training data. Further, Siamese trackers cannot predict the target location well in fast motion, full occlusion, camera motion, and similar object scenarios. Due to these difficulties, we develop an end-to-end Siamese attention network for visual tracking. Our approach is to introduce an attention branch in the region proposal network that contains a classification branch and a regression branch. We perform foreground–background classification by combining the scores of the classification branch and the attention branch. The regression branch predicts the bounding boxes of the candidate regions based on the classification results. Furthermore, the proposed tracker achieves the experimental results comparable to the state-of-the-art tracker on six tracking benchmarks. In particular, the proposed method achieves an AUC score of 0.503 on LaSOT, while running at 40 frames per second (FPS).

论文关键词：Visual tracking,Siamese network,Attention network

论文评审过程：Received 17 November 2019, Revised 21 February 2020, Accepted 23 May 2020, Available online 27 May 2020, Version of Record 8 June 2020.

论文官网地址：https://doi.org/10.1016/j.knosys.2020.106079