Semantic Image Segmentation with Improved Position Attention and Feature Fusion

作者：Hegui Zhu, Yan Miao, Xiangde Zhang

摘要

Encoder–decoder structure is an universal method for semantic image segmentation. However, some important information of images will lost with the increasing depth of convolutional neural network (CNN), and the correlation between arbitrary pixels will get worse. This paper designs a novel image segmentation model to obtain dense feature maps and promote segmentation effects. In encoder stage, we employ ResNet-50 to extract features, and then add a spatial pooling pyramid (SPP) to achieve multi-scale feature fusion. In decoder stage, we provide an improved position attention module to integrate contextual information effectively and remove the trivial information through changing the construction way of attention matrix. Furthermore, we also propose the feature fusion structure to generate dense feature maps by preforming element–wise sum operation on the upsampling features and corresponding encoder features. The simulation results illustrate that the average accuracy and mIOU on CamVid dataset can reach 90.7% and 63.1% respectively. It verifies the effectiveness and reliability of the proposed method.

论文关键词：Semantic image segmentation, Spatial pooling pyramid, Improved position attention, Feature fusion, Dense feature map

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11063-020-10240-9