Permutation SHAP versus Kernel SHAP

[This article was first published on R – Michael's and Christian's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SHAP is the predominant way to interpret black-box ML models, especially for tree-based models with the blazingly fast TreeSHAP algorithm.

For general models, two slower SHAP algorithms exist:

  1. Permutation SHAP (Štrumbelj and Kononenko, 2010)
  2. Kernel SHAP (Lundberg and Lee, 2017)

Kernel SHAP was introduced as an approximation to permutation SHAP.

The 0.4.0 CRAN release of our {kernelshap} package now contains an exact permutation SHAP algorithm for up to 14 features, and thus it becomes easy to make experiments between the two approaches.

Some initial statements about permutation SHAP and Kernel SHAP

  1. Exact permutation SHAP and exact Kernel SHAP have the same computational complexity.
  2. Technically, exact Kernel SHAP is still an approximation of exact permutation SHAP, so you should prefer the latter.
  3. Kernel SHAP assumes feature independence. Since features are never independent in practice: does this mean we should never use Kernel SHAP?
  4. Kernel SHAP can be calculated almost exactly for any number of feature, while permutation SHAP approximations get more and more inprecise for many features.

Simulation 1

We will work with the iris data because it has extremely strong correlations between features. Since Kernel SHAP assumes feature independence, it is expected to give differences between the two methods. To further see the impact of having models with and without interactions, we work with a random forest model of increasing tree depth. Depth 1 means no interactions, depth 2 means pairwise interactions etc.

library(kernelshap)
library(ranger)

differences <- numeric(4)

for (depth in 1:4) {
  fit <- ranger(
    Sepal.Length ~ Petal.Width + Petal.Length + Species, 
    mtry = 3,
    data = iris, 
    max.depth = depth,
    seed = 1
  )
  ps <- permshap(fit, iris[3:5], bg_X = iris)
  ks <- kernelshap(fit, iris[3:5], bg_X = iris)
  differences[depth] <- mean(abs(ks$S - ps$S))
}
differences
# Essentially 0
# 3.192200e-15 3.785476e-15 3.765154e-15 4.172719e-15

ps
# SHAP values of first observation:
#      Petal.Length Petal.Width     Species
# [1,]    -0.818954 -0.08683982 0.003784976
ks
# SHAP values of first observation:
#      Petal.Length Petal.Width     Species
# [1,]    -0.818954 -0.08683982 0.003784976

The mean absolute difference between the two (150 x 3) SHAP matrices is essentially 0, even in models with high-order interactions and strong correlation between features. Oh wow, this is unexpected!

Simulation 2

I am aware of one situation where exact Permutation SHAP and exact Kernel SHAP do not agree: When a correlated feature has no effect and there are interactions of order three or higher: We will mimic this behavior by evaluating the SHAP values also for a non-modeled feature (Sepal.Width). The rest of the simulation is identical:

differences <- numeric(4)

for (depth in 1:4) {
  fit <- ranger(
    Sepal.Length ~ Petal.Width + Petal.Length + Species, 
    mtry = 3,
    data = iris, 
    max.depth = depth,
    seed = 1
  )
  ps <- permshap(fit, iris[2:5], bg_X = iris)
  ks <- kernelshap(fit, iris[2:5], bg_X = iris)
  differences[depth] <- mean(abs(ks$S - ps$S))
}
differences
# There are small differences starting with interactions of order 3
# 2.373287e-15 2.842817e-15 4.822595e-06 2.676598e-05
ps
# SHAP values of first observation:
#      Sepal.Width Petal.Length Petal.Width     Species
# [1,]           0    -0.818954 -0.08683982 0.003784976
ks
# SHAP values of first observations:
#       Sepal.Width Petal.Length Petal.Width     Species
# [1,] 4.073678e-05   -0.8189676  -0.0868534 0.003771397

Now, by the regression trick behind Kernel SHAP, also the feature without effect “Sepal.Width” gets some effect from the correlated features. But only in the models with interactions of order three or higher!

Wrap-Up

  1. Use kernelshap::permshap() to crunch exact permutation SHAP values for models with not too many features.
  2. Exact Kernel SHAP almost always gives the same results as exact permutation SHAP. The one situation I have identified are models with high-order interactions and features that have no effect but are strongly correlated with modeled features. This means that Kernel SHAP is much better than people think it is.
  3. Since Kernel SHAP can be calculated almost exactly also for many features, it remains an excellent way to crunch SHAP values for arbitrary models.

The R code is here.

To leave a comment for the author, please follow the link and comment on their blog: R – Michael's and Christian's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)