Hierarchically Supervised Deconvolutional Network for Semantic Video Segmentation

Highlights：

• A unified deep learning framework is proposed to employ the spatio-temporal information for semantic video segmentation.

• A hierarchically supervised deconvolutional network is proposed to conduct semantic segmentation for single video frames.

• A coarse-to-fine training strategy is adopted to improve the foreground object segmentation.

• Transition Layers are introduced to make the label prediction consist with adjacent pixels across space and time domains.

• The state-of-the-art performance is achieved on the two datasets, CamVid and GATECH.

摘要

•A unified deep learning framework is proposed to employ the spatio-temporal information for semantic video segmentation.•A hierarchically supervised deconvolutional network is proposed to conduct semantic segmentation for single video frames.•A coarse-to-fine training strategy is adopted to improve the foreground object segmentation.•Transition Layers are introduced to make the label prediction consist with adjacent pixels across space and time domains.•The state-of-the-art performance is achieved on the two datasets, CamVid and GATECH.

论文关键词：Semantic video segmentation,Deconvolutional neural network,Coarse-to-fine training,Spatio-temporal consistence

论文评审过程：Received 16 March 2016, Revised 26 September 2016, Accepted 28 September 2016, Available online 29 September 2016, Version of Record 24 December 2016.