Knowledge distillation via instance-level sequence learning

作者:

Highlights:

摘要

Recently, distillation approaches for extracting general knowledge from a teacher network to guide a student network have been suggested. Most existing methods transfer knowledge from the teacher to the student network by feeding a sequence of random minibatches sampled uniformly from the data. We argue that, instead, a compact student network should be guided gradually using samples ordered in a meaningful sequence. Thus, the gap in feature representation between the teacher and student network can be bridged step by step. In this paper, we provide a curriculum learning knowledge distillation framework via instance-level sequence learning. It employs the student network of the early epoch as a snapshot to create a curriculum for the student network’s next training phase. We performed extensive experiments using the CIFAR-10, CIFAR-100, SVHN, and CINIC-10 datasets. When compared with several state-of-the-art methods, our framework achieved the best performance with fewer iterations.

论文关键词:Neural networks compression,Knowledge distillation,Computer vision,Deep learning

论文评审过程:Received 27 May 2020, Revised 17 September 2021, Accepted 18 September 2021, Available online 21 September 2021, Version of Record 27 September 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107519