End-to-End Supermask Pruning: Learning to Prune Image Captioning Models

作者:

Highlights:

• This is the first extensive attempt at exploring model pruning for image captioning task. Empirically, we show that deep captioning networks with 80% to 95% sparse are capable to either match or even slightly outperform their dense counterparts. In addition, we propose a pruning method - Supermask Pruning (SMP) that performs continuous and gradual sparsification during training stage based on parameter sensitivity in an end-to-end fashion.

• We investigate an ideal way to combine pruning with fine-tuning of pre-trained CNN, and show that both decoder pruning and training should be done before pruning the encoder.

• We release the pre-trained sparse models for UD and ORT that are capable of achieving CIDEr scores >120 on MS-COCO dataset; yet are only 8.7 MB (reduction of 96% compared to dense UD) and 14.5 MB (reduction of 94% compared to dense ORT) in model size. Our code and pre-trained models are publicly available at https://github.com/jiahuei/sparse-image-captioning

摘要

•This is the first extensive attempt at exploring model pruning for image captioning task. Empirically, we show that deep captioning networks with 80% to 95% sparse are capable to either match or even slightly outperform their dense counterparts. In addition, we propose a pruning method - Supermask Pruning (SMP) that performs continuous and gradual sparsification during training stage based on parameter sensitivity in an end-to-end fashion.•We investigate an ideal way to combine pruning with fine-tuning of pre-trained CNN, and show that both decoder pruning and training should be done before pruning the encoder.•We release the pre-trained sparse models for UD and ORT that are capable of achieving CIDEr scores >120 on MS-COCO dataset; yet are only 8.7 MB (reduction of 96% compared to dense UD) and 14.5 MB (reduction of 94% compared to dense ORT) in model size. Our code and pre-trained models are publicly available at https://github.com/jiahuei/sparse-image-captioning

论文关键词:Image captioning,Deep network compression,Deep learning

论文评审过程:Received 13 April 2021, Revised 23 August 2021, Accepted 4 October 2021, Available online 5 October 2021, Version of Record 11 October 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108366