Explaining black-box classifiers using post-hoc explanations-by-example: The effect of explanations and error-rates in XAI user studies

Highlights：

• Series of controlled user studies examining post-hoc example-based explanations for black-box deep learners doing classification (XAI).

• Black box AI models can be explained by “twinning” them with white-box models.

• Explanations were only found to impact people’s perception of errors.

• Explanations lead people to view errors as being “less incorrect”, but they do not improve trust.

• Trust in an AI model is undermined by increases in error-rates (from 3% error-levels onwards).

摘要

In this paper, we describe a post-hoc explanation-by-example approach to eXplainable AI (XAI), where a black-box, deep learning system is explained by reference to a more transparent, proxy model (in this situation a case-based reasoner), based on a feature-weighting analysis of the former that is used to find explanatory cases from the latter (as one instance of the so-called Twin Systems approach). A novel method (COLE-HP) for extracting the feature-weights from black-box models is demonstrated for a convolutional neural network (CNN) applied to the MNIST dataset; in which extracted feature-weights are used to find explanatory, nearest-neighbours for test instances. Three user studies are reported examining people's judgements of right and wrong classifications made by this XAI twin-system, in the presence/absence of explanations-by-example and different error-rates (from 3-60%). The judgements gathered include item-level evaluations of both correctness and reasonableness, and system-level evaluations of trust, satisfaction, correctness, and reasonableness. Several proposals are made about the user's mental model in these tasks and how it is impacted by explanations at an item- and system-level. The wider lessons from this work for XAI and its user studies are reviewed.