Repurposing existing deep networks for caption and aesthetic-guided image cropping

作者:

Highlights:

• The core research question of this paper is how can we find the image part described by a user, such that the output image crop will represent and preserve the caption information meanwhile result in an aesthetically pleasing output?

• We have proposed a caption and aesthetics guided framework for cropping images according to the user’s intention. Our framework is the first to account for the user’s intention directly from the provided image caption.

• We argue that the currently available image cropping and caption grounding datasets are not suitable for our description-based image cropping task. Therefore, we proposed a novel dataset with multiple ground truth bounding box annotations for each caption.

• The experiments in Section 4.2 show that we can achieve better performance than the baseline methods for caption-based image cropping by re-proposing existing deep networks.

摘要

•The core research question of this paper is how can we find the image part described by a user, such that the output image crop will represent and preserve the caption information meanwhile result in an aesthetically pleasing output?•We have proposed a caption and aesthetics guided framework for cropping images according to the user’s intention. Our framework is the first to account for the user’s intention directly from the provided image caption.•We argue that the currently available image cropping and caption grounding datasets are not suitable for our description-based image cropping task. Therefore, we proposed a novel dataset with multiple ground truth bounding box annotations for each caption.•The experiments in Section 4.2 show that we can achieve better performance than the baseline methods for caption-based image cropping by re-proposing existing deep networks.

论文关键词:Image cropping,Aesthetics,Deep network re-purposing,Image captioning

论文评审过程:Received 25 March 2021, Revised 25 November 2021, Accepted 3 December 2021, Available online 5 January 2022, Version of Record 14 February 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108485