Efficient semantic image segmentation with multi-class ranking prior

摘要

Semantic image segmentation is of fundamental importance in a wide variety of computer vision tasks, such as scene understanding, robot navigation and image retrieval, which aims to simultaneously decompose an image into semantically consistent regions. Most of existing works addressed it as structured prediction problem by combining contextual information with low-level cues based on conditional random fields (CRFs), which are often learned by heuristic search based on maximum likelihood estimation. In this paper, we use maximum margin based structural support vector machine (S-SVM) model to combine multiple levels of cues to attenuate the ambiguity of appearance similarity and propose a novel multi-class ranking based global constraint to confine the object classes to be considered when labeling regions within an image. Compared with existing global cues, our method is more balanced between expressive power for heterogeneous regions and the efficiency of searching exponential space of possible label combinations. We then introduce inter-class co-occurrence statistics as pairwise constraints and combine them with the prediction from local and global cues based on S-SVMs framework. This enables the joint inference of labeling within an image for better consistency. We evaluate our algorithm on two challenging datasets which are widely used for semantic segmentation evaluation: MSRC-21 dataset and Stanford Background dataset and experimental results show that we obtain high competitive performance compared with state-of-the-art methods, despite that our model is much simpler and efficient.