Joint temporal context exploitation and active learning for video segmentation

Highlights：

• In this work, we introduce multitask learning using a recurrent network to improve the accuracy of motion inference in multi-scale analysis.

• We present a novel way to temporal context to infer the mask in the next frame in a probability framework, and compare this result to the spatial result according to the confidence interval.

• We use uncertainty in the probability mask transfer to improve sample selection for unsupervised active learning.

• We also collect and annotate oral video sequence, and create a new data set called Shining3D dental data set, so that the approach proposed in this paper can be compared with other state-of-the-art approaches on video segmentation.

摘要

•In this work, we introduce multitask learning using a recurrent network to improve the accuracy of motion inference in multi-scale analysis.•We present a novel way to temporal context to infer the mask in the next frame in a probability framework, and compare this result to the spatial result according to the confidence interval.•We use uncertainty in the probability mask transfer to improve sample selection for unsupervised active learning.•We also collect and annotate oral video sequence, and create a new data set called Shining3D dental data set, so that the approach proposed in this paper can be compared with other state-of-the-art approaches on video segmentation.

论文评审过程：Received 29 May 2019, Revised 29 October 2019, Accepted 11 December 2019, Available online 14 December 2019, Version of Record 13 May 2020.