# SHAP Values of Additive Models

**R – Michael's and Christian's Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Within only a few years, SHAP (Shapley additive explanations) has emerged as the number 1 way to investigate black-box models. The basic idea is to decompose model predictions into additive contributions of the features in a fair way. Studying decompositions of many predictions allows to derive global properties of the model.

**What happens if we apply SHAP algorithms to additive models? Why would this ever make sense?**

In the spirit of our “Lost In Translation” series, we provide both high-quality Python and R code.

## The models

Let’s build the models using a dataset with three highly correlated covariates and a (deterministic) response.

library(lightgbm) library(kernelshap) library(shapviz) #=================================================================== # Make small data #=================================================================== make_data <- function(n = 100) { x1 <- seq(0.01, 1, length = n) data.frame( x1 = x1, x2 = log(x1), x3 = x1 > 0.7 ) |> transform(y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + 10 * sin(2 * pi * x1)) } df <- make_data() head(df) cor(df) |> round(2) # x1 x2 x3 y # x1 1.00 0.90 0.80 -0.72 # x2 0.90 1.00 0.58 -0.53 # x3 0.80 0.58 1.00 -0.59 # y -0.72 -0.53 -0.59 1.00 #=================================================================== # Additive linear model and additive boosted trees #=================================================================== # Linear regression fit_lm <- lm(y ~ poly(x1, 3) + poly(x2, 3) + x3, data = df) summary(fit_lm) # Boosted trees xvars <- setdiff(colnames(df), "y") X <- data.matrix(df[xvars]) params <- list( learning_rate = 0.05, objective = "mse", max_depth = 1, colsample_bynode = 0.7 ) fit_lgb <- lgb.train( params = params, data = lgb.Dataset(X, label = df$y), nrounds = 300 )

import numpy as np import lightgbm as lgb import shap from sklearn.preprocessing import PolynomialFeatures from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.linear_model import LinearRegression #=================================================================== # Make small data #=================================================================== def make_data(n=100): x1 = np.linspace(0.01, 1, n) x2 = np.log(x1) x3 = x1 > 0.7 X = np.column_stack((x1, x2, x3)) y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + np.sin(2 * np.pi * x1) return X, y X, y = make_data() #=================================================================== # Additive linear model and additive boosted trees #=================================================================== # Linear model with polynomial terms poly = PolynomialFeatures(degree=3, include_bias=False) preprocessor = ColumnTransformer( transformers=[ ("poly0", poly, [0]), ("poly1", poly, [1]), ("other", "passthrough", [2]), ] ) model_lm = Pipeline( steps=[ ("preprocessor", preprocessor), ("lm", LinearRegression()), ] ) _ = model_lm.fit(X, y) # Boosted trees with single-split trees params = dict( learning_rate=0.05, objective="mse", max_depth=1, colsample_bynode=0.7, ) model_lgb = lgb.train( params=params, train_set=lgb.Dataset(X, label=y), num_boost_round=300, )

## SHAP

For both models, we use exact permutation SHAP and exact Kernel SHAP. Furthermore, the linear model is analyzed with “additive SHAP”, and the tree-based model with TreeSHAP.

Do the algorithms provide the same?

system.time({ # 1s shap_lm <- list( add = shapviz(additive_shap(fit_lm, df)), kern = kernelshap(fit_lm, X = df[xvars], bg_X = df), perm = permshap(fit_lm, X = df[xvars], bg_X = df) ) shap_lgb <- list( tree = shapviz(fit_lgb, X), kern = kernelshap(fit_lgb, X = X, bg_X = X), perm = permshap(fit_lgb, X = X, bg_X = X) ) }) # Consistent SHAP values for linear regression all.equal(shap_lm$add$S, shap_lm$perm$S) all.equal(shap_lm$kern$S, shap_lm$perm$S) # Consistent SHAP values for boosted trees all.equal(shap_lgb$lgb_tree$S, shap_lgb$lgb_perm$S) all.equal(shap_lgb$lgb_kern$S, shap_lgb$lgb_perm$S) # Linear coefficient of x3 equals slope of SHAP values tail(coef(fit_lm), 1) # 0.682815 diff(range(shap_lm$kern$S[, "x3"])) # 0.682815 sv_dependence(shap_lm$add, xvars)sv_dependence(shap_lm$add, xvars, color_var = NULL)

shap_lm = { "add": shap.Explainer(model_lm.predict, masker=X, algorithm="additive")(X), "perm": shap.Explainer(model_lm.predict, masker=X, algorithm="exact")(X), "kern": shap.KernelExplainer(model_lm.predict, data=X).shap_values(X), } shap_lgb = { "tree": shap.Explainer(model_lgb)(X), "perm": shap.Explainer(model_lgb.predict, masker=X, algorithm="exact")(X), "kern": shap.KernelExplainer(model_lgb.predict, data=X).shap_values(X), } # Consistency for additive linear regression eps = 1e-12 assert np.abs(shap_lm["add"].values - shap_lm["perm"].values).max() < eps assert np.abs(shap_lm["perm"].values - shap_lm["kern"]).max() < eps # Consistency for additive boosted trees assert np.abs(shap_lgb["tree"].values - shap_lgb["perm"].values).max() < eps assert np.abs(shap_lgb["perm"].values - shap_lgb["kern"]).max() < eps # Linear effect of last feature in the fitted model model_lm.named_steps["lm"].coef_[-1] # 1.112096 # Linear effect of last feature derived from SHAP values (ignore the sign) shap_lm["perm"][:, 2].values.ptp() # 1.112096 shap.plots.scatter(shap_lm["add"])

Yes – the three algorithms within model provide the same SHAP values. Furthermore, the SHAP values reconstruct the additive components of the features.

Didactically, this is very helpful when introducing SHAP as a method: Pick a white-box and a black-box model and compare their SHAP dependence plots. For the white-box model, you simply see the additive components, while the dependence plots of the black-box model show scatter due to interactions.

**Remark: The exact equivalence between algorithms is lost, when**

- there are too many features for exact procedures (~10+ features), and/or when
- the background data of Kernel/Permutation SHAP does not agree with the training data. This leads to slightly different estimates of the baseline value, which itself influences the calculation of SHAP values.

## Final words

- SHAP algorithms applied to additive models typically give identical results. Slight differences might occur because sampling versions of the algos are used, or a different baseline value is estimated.
- The resulting SHAP values describe the additive components.
- Didactically, it helps to see SHAP analyses of white-box and black-box models side by side.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Michael's and Christian's Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.