Visualize SHAP Values without Tears

Posted on June 10, 2022 by Michael Mayer in R bloggers | 0 Comments

[This article was first published on R – Michael's and Christian's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SHAP (SHapley Additive exPlanations, Lundberg and Lee, 2017) is an ingenious way to study black box models. SHAP values decompose – as fair as possible – predictions into additive feature contributions.

When it comes to SHAP, the Python implementation is the de-facto standard. It not only offers many SHAP algorithms, but also provides beautiful plots. In R, the situation is a bit more confusing. Different packages contain implementations of SHAP algorithms, e.g.,

some of which with great visualizations. Plus there is SHAPforxgboost (see my recent post), originally designed to visualize the results of SHAP values calculated from XGBoost, but it can also be used more generally by now.

The shapviz package

In order to entangle calculation from visualization, the shapviz package was designed. It solely focuses on visualization of SHAP values. Closely following its README, it currently provides these plots:

sv_waterfall(): Waterfall plots to study single predictions.
sv_force(): Force plots as an alternative to waterfall plots.
sv_importance(): Importance plots (bar and/or beeswarm plots) to study variable importance.
sv_dependence(): Dependence plots to study feature effects (optionally colored by heuristically strongest interacting feature).

They require a “shapviz” object, which is built from two things only:

S: Matrix of SHAP values
X: Dataset with corresponding feature values

Furthermore, a “baseline” can be passed to represent an average prediction on the scale of the SHAP values.

A key feature of the “shapviz” package is that X is used for visualization only. Thus it is perfectly fine to use factor variables, even if the underlying model would not accept these.

To further simplify the use of shapviz, direct connectors to the packages

are available.

Installation

The package shapviz can be installed from CRAN or Github:

devtools::install_github("shapviz")

devtools::install_github("mayer79/shapviz")

Example

Shiny diamonds… let’s model their prices by four “c” variables with XGBoost, and create an explanation dataset with 2000 randomly picked diamonds.

library(shapviz)
library(ggplot2)
library(xgboost)

set.seed(3653)

X <- diamonds[c("carat", "cut", "color", "clarity")]
dtrain <- xgb.DMatrix(data.matrix(X), label = diamonds$price)

fit <- xgb.train(
  params = list(learning_rate = 0.1, objective = "reg:squarederror"), 
  data = dtrain,
  nrounds = 65L
)

# Explanation dataset
X_small <- X[sample(nrow(X), 2000L), ]

Create "shapviz" object

One line of code creates a shapviz object. It contains SHAP values and feature values for the set of observations we are interested in. Note again that X is solely used as explanation dataset, not for calculating SHAP values.

In this example we construct the shapviz object directly from the fitted XGBoost model. Thus we also need to pass a corresponding prediction dataset X_pred used for calculating SHAP values by XGBoost.

shp <- shapviz(fit, X_pred = data.matrix(X_small), X = X_small)

Explaining one single prediction

Let's start by explaining a single prediction by a waterfall plot or, alternatively, a force plot.

# Two types of visualizations
sv_waterfall(shp, row_id = 1)
sv_force(shp, row_id = 1

Factor/character variables are kept as they are, even if the underlying XGBoost model required them to be integer encoded.

Explaining the model as a whole

We have decomposed 2000 predictions, not just one. This allows us to study variable importance at a global model level by studying average absolute SHAP values as a bar plot or by looking at beeswarm plots of SHAP values.

# Three types of variable importance plots
sv_importance(shp)
sv_importance(shp, kind = "bar")
sv_importance(shp, kind = "both", alpha = 0.2, width = 0.2)

A scatterplot of SHAP values of a feature like color against its observed values gives a great impression on the feature effect on the response. Vertical scatter gives additional info on interaction effects. shapviz offers a heuristic to pick another feature on the color scale with potential strongest interaction.

sv_dependence(shp, v = "color", "auto")

Dependence plot with automatic interaction colorization

Summary

The "shapviz" has a single purpose: making SHAP plots.
Its interface is optimized for existing SHAP crunching packages and can easily be used in future packages as well.
All plots are highly customizable. Furthermore, they are all written with ggplot and allow corresponding modifications.

The complete R script can be found here.

References

Scott M. Lundberg and Su-In Lee. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30 (2017).

To leave a comment for the author, please follow the link and comment on their blog: R – Michael's and Christian's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Visualize SHAP Values without Tears

The shapviz package

Installation

Example

Create "shapviz" object

Explaining one single prediction

Explaining the model as a whole

Summary

References

Related

The shapviz package

Installation

Example

Create "shapviz" object

Explaining one single prediction

Explaining the model as a whole

Summary

References

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)