Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition

作者:

Highlights:

• We investigate teacher-student learning and Tucker decomposition to compress and accelerate convolutional layers within CTC-trained CNN-DBLSTM models for OCR. To the best of our knowledge, we are the first to address this problem.

• Based on the architecture of CNN-DBLSTM model, we propose an objective function for teacher-student learning that directly matches the feature sequences extracted by CNNs of teacher and student models under the guidance of the succeeding LSTM layers. Experimental results on large scale handwritten and printed OCR tasks show that student model trained with the proposed criterion outperforms that trained with a standard KL divergence criterion.

• We explore the effectiveness of combining teacher-student learning and Tucker decomposition. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition for further compression and acceleration. Our results show that we can build a very compact CNN-DBLSTM model by using this method, which can reduce significantly both the footprint and computation cost without or with a small recognition accuracy degradation.

摘要

•We investigate teacher-student learning and Tucker decomposition to compress and accelerate convolutional layers within CTC-trained CNN-DBLSTM models for OCR. To the best of our knowledge, we are the first to address this problem.•Based on the architecture of CNN-DBLSTM model, we propose an objective function for teacher-student learning that directly matches the feature sequences extracted by CNNs of teacher and student models under the guidance of the succeeding LSTM layers. Experimental results on large scale handwritten and printed OCR tasks show that student model trained with the proposed criterion outperforms that trained with a standard KL divergence criterion.•We explore the effectiveness of combining teacher-student learning and Tucker decomposition. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition for further compression and acceleration. Our results show that we can build a very compact CNN-DBLSTM model by using this method, which can reduce significantly both the footprint and computation cost without or with a small recognition accuracy degradation.

论文关键词:Optical character recognition,CNN-DBLSTM Character model,Model compression,Teacher-student learning,Tucker decomposition

论文评审过程:Received 10 December 2018, Revised 27 June 2019, Accepted 7 July 2019, Available online 12 July 2019, Version of Record 17 July 2019.

论文官网地址:https://doi.org/10.1016/j.patcog.2019.07.002