Transferring Inter-Class Correlation for Teacher–Student frameworks with flexible models

作者:

Highlights:

摘要

The Teacher–Student (T–S) framework is widely utilized in classification tasks, through which the performance of one neural network (the student) can be improved by transferring knowledge from another trained neural network (the teacher). As the transferring knowledge is related to the network capacities and structures between the teacher and the student, how to define knowledge effectively remains an open question. To address this issue, we design a novel and flexible transferring knowledge, Self-Attention based Inter-Class Correlation (ICC) map, which reveals the correlation between every two classes in a mini-batch. Based on the ICC map, we propose a T–S framework, Inter-Class Correlation Transfer (ICCT), in which the knowledge from the teacher with a higher, equal, or lower capacity than the student can bring the benefit to the training process of the student. The ICCT can be applied flexibly on the heterogeneous network structures of the T–S pairs and exhibits excellent compatibility with existing frameworks with hidden-layers knowledge. Notably, the analysis of the ICCT demonstrates that students comprehensively learn the teacher’s knowledge in conjunction with their own understanding, rather than mimicking the teacher’s knowledge entirely. Extensive experiments are conducted in CIFAR-10, CIFAR-100, and ILSVRC2012 image classification datasets in different T–S application scenarios with different network structures. The results demonstrate that the ICCT can improve the student’s performance and outperform other state-of-the-art T–S frameworks.

论文关键词:Teacher–Student framework,Model distillation,Transferring knowledge,Self-Attention mechanism

论文评审过程:Received 13 April 2021, Revised 23 January 2022, Accepted 25 January 2022, Available online 15 February 2022, Version of Record 26 February 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108316