A Novel Framework for Explaining Machine Learning Using Shapley Values


[Submitted on 17 Sep 2019 (v1), last revised 25 Jun 2020 (this version, v3)]

Download PDF

Abstract: A number of techniques have been proposed to explain a machine learning
model’s prediction by attributing it to the corresponding input features.
Popular among these are techniques that apply the Shapley value method from
cooperative game theory. While existing papers focus on the axiomatic
motivation of Shapley values, and efficient techniques for computing them, they
offer little justification for the game formulations used, and do not address
the uncertainty implicit in their methods’ outputs. For instance, the popular
SHAP algorithm’s formulation may give substantial attributions to features that
play no role in the model. In this work, we illustrate how subtle differences
in the underlying game formulations of existing methods can cause large
differences in the attributions for a prediction. We then present a general
game formulation that unifies existing methods, and enables straightforward
confidence intervals on their attributions. Furthermore, it allows us to
interpret the attributions as contrastive explanations of an input relative to
a distribution of reference inputs. We tie this idea to classic research in
cognitive psychology on contrastive explanations, and propose a conceptual
framework for generating and interpreting explanations for ML models, called
formulate, approximate, explain (FAE). We apply this framework to explain
black-box models trained on two UCI datasets and a Lending Club dataset.

Submission history

From: Luke Merrick [view email]

Tue, 17 Sep 2019 22:15:09 UTC (6,543 KB)

Wed, 20 Nov 2019 20:09:55 UTC (3,282 KB)

Thu, 25 Jun 2020 23:18:21 UTC (382 KB)


Please enter your comment!
Please enter your name here