Weak supervision for detecting object classes from activities

作者:

Highlights:

摘要

Weakly supervised learning for object detection has been gaining significant attention in the recent past. Visually similar objects are extracted automatically from weakly labeled videos hence bypassing the tedious process of manually annotating training data. However, the problem as applied to small or medium sized objects is still largely unexplored. Our observation is that weakly labeled information can be derived from videos involving human-object interactions. Since the object is characterized neither by its appearance nor its motion in such videos, we propose a robust framework that taps valuable human context and models similarity of objects based on appearance and functionality. Furthermore, the framework is designed such that it maximizes the utility of the data by detecting possibly multiple instances of an object from each video. We show that object models trained in this fashion perform between 86% and 92% of their fully supervised counterparts on three challenging RGB and RGB-D datasets.

论文关键词:

论文评审过程:Received 31 December 2015, Revised 7 June 2016, Accepted 12 September 2016, Available online 13 September 2016, Version of Record 14 February 2017.

论文官网地址:https://doi.org/10.1016/j.cviu.2016.09.006