Knowledge distillation via instance-level sequence learning

作者：

Highlights：

•

摘要

Recently, distillation approaches for extracting general knowledge from a teacher network to guide a student network have been suggested. Most existing methods transfer knowledge from the teacher to the student network by feeding a sequence of random minibatches sampled uniformly from the data. We argue that, instead, a compact student network should be guided gradually using samples ordered in a meaningful sequence. Thus, the gap in feature representation between the teacher and student network can be bridged step by step. In this paper, we provide a curriculum learning knowledge distillation framework via instance-level sequence learning. It employs the student network of the early epoch as a snapshot to create a curriculum for the student network’s next training phase. We performed extensive experiments using the CIFAR-10, CIFAR-100, SVHN, and CINIC-10 datasets. When compared with several state-of-the-art methods, our framework achieved the best performance with fewer iterations.

论文关键词：Neural networks compression,Knowledge distillation,Computer vision,Deep learning

论文评审过程：Received 27 May 2020, Revised 17 September 2021, Accepted 18 September 2021, Available online 21 September 2021, Version of Record 27 September 2021.

论文官网地址：https://doi.org/10.1016/j.knosys.2021.107519