Geometry Attention Transformer with position-aware LSTMs for image captioning

作者:

Highlights:

• An improved image captioning model, GAT is proposed on transformer framework.

• We design an encoder cooperated by a gate-controlled GSR.

• We reconstruct a decoder promoted by position-LSTM groups.

• Ablation experiments and comparisons are performed on COCO and Flickr30K.

摘要

•An improved image captioning model, GAT is proposed on transformer framework.•We design an encoder cooperated by a gate-controlled GSR.•We reconstruct a decoder promoted by position-LSTM groups.•Ablation experiments and comparisons are performed on COCO and Flickr30K.

论文关键词:Image captioning,Transformer framework,Gate-controlled geometry attention,Position-aware LSTM

论文评审过程:Received 22 October 2021, Revised 31 March 2022, Accepted 1 April 2022, Available online 9 April 2022, Version of Record 19 April 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.117174