Attention-shift based deep neural network for fine-grained visual categorization

作者:

Highlights:

• We re-investigate the pipeline of fine-grained visual categorization (FGVC) techniques from the view of human visual recognition system, and propose a novel Attention-Shift based Deep Neural Network (AS-DNN) for automatic parts locating and semantic correlation learning.

• We propose an end-to-end trainable sub-network structure Csft to simulate the attention-shift process. Csft locates the discriminative regions automatically and encodes and decodes the semantic relations among diverse discriminative parts iteratively.

• Comprehensive experiments show that AS-DNN achieves state-of-the-art performances in three widely used challenging datasets. Moreover, the visualization of located discriminative parts proves the robustness of AS-DNN in complex backgrounds and postures.

摘要

•We re-investigate the pipeline of fine-grained visual categorization (FGVC) techniques from the view of human visual recognition system, and propose a novel Attention-Shift based Deep Neural Network (AS-DNN) for automatic parts locating and semantic correlation learning.•We propose an end-to-end trainable sub-network structure Csft to simulate the attention-shift process. Csft locates the discriminative regions automatically and encodes and decodes the semantic relations among diverse discriminative parts iteratively.•Comprehensive experiments show that AS-DNN achieves state-of-the-art performances in three widely used challenging datasets. Moreover, the visualization of located discriminative parts proves the robustness of AS-DNN in complex backgrounds and postures.

论文关键词:Fine-grained visual categorization,Deep neural network,Human perception mechanism,Attention-shift,Encoder-decoder

论文评审过程:Received 4 August 2019, Revised 24 October 2019, Accepted 11 March 2021, Available online 18 March 2021, Version of Record 26 March 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.107947