Mask encoding: A general instance mask representation for object segmentation

作者:

Highlights:

• We propose to encode a two-dimensional binary instance mask into a compact representation vector. The compressed vector, takes advantages of the redundancy in the original mask and proves to be effective and efficient for reconstruction.

• Encoding can be done with a few dictionary learning methods, including principal component analysis (PCA), sparse coding, and auto-encoders. We integrate this mask representation into Mask R-CNN framework with slight modifications to the model architecture. Our method consistently improves mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset.

• With this mask representation, a new framework is proposed for single shot instance segmentation, by extending FCOS with a mask branch for mask coefficient regression. Our mask encoding is completely independent of the mechanism of detectors, and it could be easily incorporated into other object detectors. Our method holds a significant lead in accuracy compared with other explicit contour-based one-stage frameworks.

• The proposed method is seamlessly extended for video instance segmentation across video frames by adding a vanilla track branch, achieving favourable performance on YouTube-VIS dataset.

摘要

•We propose to encode a two-dimensional binary instance mask into a compact representation vector. The compressed vector, takes advantages of the redundancy in the original mask and proves to be effective and efficient for reconstruction.•Encoding can be done with a few dictionary learning methods, including principal component analysis (PCA), sparse coding, and auto-encoders. We integrate this mask representation into Mask R-CNN framework with slight modifications to the model architecture. Our method consistently improves mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset.•With this mask representation, a new framework is proposed for single shot instance segmentation, by extending FCOS with a mask branch for mask coefficient regression. Our mask encoding is completely independent of the mechanism of detectors, and it could be easily incorporated into other object detectors. Our method holds a significant lead in accuracy compared with other explicit contour-based one-stage frameworks.•The proposed method is seamlessly extended for video instance segmentation across video frames by adding a vanilla track branch, achieving favourable performance on YouTube-VIS dataset.

论文关键词:Mask encoding,Instance segmentation,Video instance segmentation

论文评审过程:Received 30 July 2021, Revised 25 October 2021, Accepted 20 December 2021, Available online 28 December 2021, Version of Record 3 January 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108505