Self-attention neural architecture search for semantic image segmentation

作者:

Highlights:

摘要

Self-attention can capture long-distance dependencies and is widely used in semantic segmentation. Existing methods mainly use two kinds of self-attentions, i.e., spatial attention and channel attention, which can capture the relations in HW dimension (image plane, height and width) and C dimension (channels), respectively. Very little research investigates self-attention along other dimensions, which can potentially improve the segmentation performance. In this work, we investigate the self-attentions along all the possible dimensions {H,W,C,HW,HC,CW,HWC}. Then we explore the aggregation of all the possible self-attentions. We apply the neural architecture search (NAS) technique to achieve optimal aggregation. Specifically, we carefully design (1) the search space and (2) the optimization method. For (1), we introduce a building block, a basic self-attention search unit (BSU), which can model self-attentions along all the dimensions. And the search space contains within-BSU and cross-BSU operations. In addition, we propose an attention-map splitting method, which can reduce the computations by 1/3. For (2), we apply an efficient differentiable optimization method to search the optimal aggregation. We conduct extensive experiments on Cityscapes and ADE20K datasets. The results show the effectiveness of the proposed method, and we achieve very competitive performance against state-of-the-art methods.

论文关键词:Self-attention,Neural architecture search,Semantic segmentation

论文评审过程:Received 10 September 2021, Revised 8 December 2021, Accepted 11 December 2021, Available online 18 December 2021, Version of Record 1 January 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107968