Towards robust explanations for deep neural networks

作者:

Highlights:

• We investigate how to enhance the resilience of explanations against manipulation.

• Explanations visualize the relevance of each input feature for the network’s prediction.

• We develop a theoretical framework and derive bounds on the maximal change of an explanation.

• Based on these insights we present three different techniques to increase robustness.

• training with weight decay.

• smoothing activation functions.

• minimizing the Hessian of the network.

• Application of our methods shows significantly improved resilience of explanations.

摘要

•We investigate how to enhance the resilience of explanations against manipulation.•Explanations visualize the relevance of each input feature for the network’s prediction.•We develop a theoretical framework and derive bounds on the maximal change of an explanation.•Based on these insights we present three different techniques to increase robustness.•training with weight decay.•smoothing activation functions.•minimizing the Hessian of the network.•Application of our methods shows significantly improved resilience of explanations.

论文关键词:Explanation method,Saliency map,Adversarial attacks,Manipulation,Neural networks,

论文评审过程:Received 18 December 2020, Revised 10 June 2021, Accepted 20 July 2021, Available online 29 July 2021, Version of Record 15 August 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108194