Context-aware co-supervision for accurate object detection

Highlights：

• Our research reveals the usefulness of combining top-down and bottom-up signals in object detection, and we believe it to be generalized to other tasks. The simplicity and originality of our approach leave much room for future research, in which we will append more powerful modules to enhance contexts and other cues for visual recognition.

摘要

•We advocate the importance of equipping two-stage detectors with top-down signals, in order to which provides high-level contextual cues to complement low-level features. In practice, this is implemented by adding a side path in the detection head to predict all object classes in the image, which is co-supervised by image-level semantics and requires little extra overheads.•Our research reveals the usefulness of combining top-down and bottom-up signals in object detection, and we believe it to be generalized to other tasks. The simplicity and originality of our approach leave much room for future research, in which we will append more powerful modules to enhance contexts and other cues for visual recognition.

论文评审过程：Received 27 November 2020, Revised 3 June 2021, Accepted 20 July 2021, Available online 27 July 2021, Version of Record 10 August 2021.