A progressive learning framework based on single-instance annotation for weakly supervised object detection

摘要

Fully-supervised object detection (FSOD) and weakly-supervised object detection (WSOD) are two extremes in the field of object detection. The former relies entirely on detailed bounding-box annotations while the later discards them completely. To balance these two extremes, we propose to make use of the so-called single-instance annotations, i.e., all images that contain only a single object are labeled with the corresponding bounding-boxes. By using such instance annotations of the simplest images, we propose a progressive learning framework that integrates image-level learning, single-instance learning, and multi-instance learning into an end-to-end network. Specifically, our framework is composed of three parallel streams that share a proposal feature extractor. The first stream is supervised by image-level annotations, which provides global information of all training data for the shared feature extractor. The second stream is supervised by single-instance annotations to bridge the features learning gap between the image level and instance level. To further learn from complex images, we propose an overlap-based instance mining algorithm to mine pseudo multi-instance annotations from the detection results of the second stream, and use them to supervise the third stream. Our method achieves a trade-off between the detection accuracy and annotation cost. Extensive experiments demonstrate the effectiveness of our proposed method on the PASCAL VOC and MS-COCO dataset, implying that a few single-instance annotations can improve the detection performance of WSOD significantly (more than 10%) and reduce the average annotation cost of FSOD greatly (more than 5 times).