Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition

Highlights：

• Based on the architecture of CNN-DBLSTM model, we propose an objective function for teacher-student learning that directly matches the feature sequences extracted by CNNs of teacher and student models under the guidance of the succeeding LSTM layers. Experimental results on large scale handwritten and printed OCR tasks show that student model trained with the proposed criterion outperforms that trained with a standard KL divergence criterion.

• We explore the effectiveness of combining teacher-student learning and Tucker decomposition. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition for further compression and acceleration. Our results show that we can build a very compact CNN-DBLSTM model by using this method, which can reduce significantly both the footprint and computation cost without or with a small recognition accuracy degradation.

摘要

•We investigate teacher-student learning and Tucker decomposition to compress and accelerate convolutional layers within CTC-trained CNN-DBLSTM models for OCR. To the best of our knowledge, we are the first to address this problem.•Based on the architecture of CNN-DBLSTM model, we propose an objective function for teacher-student learning that directly matches the feature sequences extracted by CNNs of teacher and student models under the guidance of the succeeding LSTM layers. Experimental results on large scale handwritten and printed OCR tasks show that student model trained with the proposed criterion outperforms that trained with a standard KL divergence criterion.•We explore the effectiveness of combining teacher-student learning and Tucker decomposition. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition for further compression and acceleration. Our results show that we can build a very compact CNN-DBLSTM model by using this method, which can reduce significantly both the footprint and computation cost without or with a small recognition accuracy degradation.