Caption generation on scenes with seen and unseen object categories

作者：

Highlights：

• The problem of true zero-shot image captioning (ZSC) is proposed.

• ZSC involves captioning over classes with no visual or textual train examples.

• A zero-shot object detection-driven approach is proposed to detect unseen objects.

• A template-based model is used to transform detections into sentences.

• A new evaluation metric (V-METEOR) is proposed for ZSC evaluation purposes.

摘要

•The problem of true zero-shot image captioning (ZSC) is proposed.•ZSC involves captioning over classes with no visual or textual train examples.•A zero-shot object detection-driven approach is proposed to detect unseen objects.•A template-based model is used to transform detections into sentences.•A new evaluation metric (V-METEOR) is proposed for ZSC evaluation purposes.

论文关键词：Zero-shot learning,Zero-shot image captioning

论文评审过程：Received 30 March 2022, Revised 17 June 2022, Accepted 22 June 2022, Available online 27 June 2022, Version of Record 6 July 2022.

论文官网地址：https://doi.org/10.1016/j.imavis.2022.104515