Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning

作者：

Highlights：

• Introducing VAE to regularize the shared encoder and extract image features more effectively by reconstructing input images.

• Improving the performance of image caption significantly by virtue of low-level and high-level image features simultaneously.

• Enhancing the final text description quality by adding self-attention to spatial features.

• Our proposed model outperforms the state-of-the-art models in the remote sensing image captioning.

摘要

•Introducing VAE to regularize the shared encoder and extract image features more effectively by reconstructing input images.•Improving the performance of image caption significantly by virtue of low-level and high-level image features simultaneously.•Enhancing the final text description quality by adding self-attention to spatial features.•Our proposed model outperforms the state-of-the-art models in the remote sensing image captioning.

论文关键词：Transformer,Variational Autoencoder,Transfer learning,Remote sensing image captioning,Self-attention mechanisms,Convolutional neural network,Reinforcement learning

论文评审过程：Received 15 January 2020, Revised 9 April 2020, Accepted 13 April 2020, Available online 23 April 2020, Version of Record 3 June 2020.

论文官网地址：https://doi.org/10.1016/j.knosys.2020.105920