Contextual ensemble network for semantic segmentation

Highlights：

• Instead of using switch variables to record pooling indices, the feature maps of encoder are concatenated to the decoder, avoiding the extra noise introduced through padding, and without extra memory space to store pooling indices at the same time.

• The CENet is trained end-to-end and easy to execute without any post-processing, which facilitates well for semantic segmentation. The experimental results show the superior performance of CENet on CityScapes, PASCAL VOC 2012, MS COCO, and ISBI 2012 datasets.

摘要

•Contextual ensemble network (CENet) introduces a novel encoder-decoder architecture to capture multi-scale context via ensemble deconvolution. The stacked feature maps are complemented each other, allowing us to fully explore multiple scale contextual cues embedded in images.•Instead of using switch variables to record pooling indices, the feature maps of encoder are concatenated to the decoder, avoiding the extra noise introduced through padding, and without extra memory space to store pooling indices at the same time.•The CENet is trained end-to-end and easy to execute without any post-processing, which facilitates well for semantic segmentation. The experimental results show the superior performance of CENet on CityScapes, PASCAL VOC 2012, MS COCO, and ISBI 2012 datasets.

论文评审过程：Received 6 November 2020, Revised 9 March 2021, Accepted 5 April 2021, Available online 20 September 2021, Version of Record 1 October 2021.