Robust model-based scene interpretation by multilayered context information

摘要

In this paper, we present a new graph-based frame work for collaborative place, object, and part recognition in indoor environments. We consider a scene to be an undirected graphical model composed of a place node, object nodes, and part nodes with undirected links. Our key contribution is the introduction of collaborative place and object recognition (we call it as the hierarchical context in this paper) instead of object only or causal relation of place to objects. We unify the hierarchical context and the well-known spatial context into a complete hierarchical graphical model (HGM). In the HGM, object and part nodes contain labels and related pose information instead of only a label for robust inference of objects. The most difficult problems of the HGM are learning and inferring variable graph structures. We learn the HGM in a piecewise manner instead of by joint graph learning for tractability. Since the inference includes variable structure estimation with marginal distribution of each node, we approximate the pseudo-likelihood of marginal distribution using multimodal sequential Monte Carlo with weights updated by belief propagation. Data-driven multimodal hypothesis and context-based pruning provide the correct inference. For successful recognition, issues related to 3D object recognition are also considered and several state-of-the-art methods are incorporated. The proposed system greatly reduces false alarms using the spatial and hierarchical contexts. We demonstrate the feasibility of the HGM-based collaborative place, object, and part recognition in actual large-scale environments for guidance applications (12 places, 112 3D objects).