DALEX: how would you explain this prediction?

[This article was first published on SmarterPoland.pl » English, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week I wrote about single variable explainers implemented in the DALEX package. They are useful to plot relation between a model output and a single variable.

But sometimes we are more focused on a single model prediction. If our model predicts possible drug response for a patient, we really need to know which factors drive the model prediction for a particular patient. For linear models it is relatively easy as the structure of the model is additive. In 2017 we have developed breakDown package for lm/glm models.

But how to explain/decompose/approximate predictions for any black box model?
There are several approaches. The (probably) most known is LIME with great examples for image and text data. There is an R port lime developed by Thomas Pedersen. In collaboration with Mateusz Staniak we developed live package, similar approach, easy to use with models created by mlr package.
The other technique that can be used here are Shapley values which use attribution theory/game theory to attribute effects of different variables for a single prediction.

Recently we have developed a yet another approach (paper under preparation, implemented in the breakDown version 0.4) that works in a model agnostic way (here you can check how to use it for caret models). You can play with it via the single_prediction() function in the DALEX package.
Such decomposition is useful for many reasons mentioned in papers listed above (deeper understanding, validation, trust, etc).
And, what is really extra about the DALEX package, you can compare contributions of different models on the same scale.

Let’s train three models (glm / gradient boosting model and random forest model) to predict quality of wine. These models are very different in structure. In the figure below, for a single wine, we compare predictions calculated by these models. For this single wine, for all models the most influential variable is the alcohol concentration as the wine has much higher concentration than average. Then pH and sulphates take second and third positions in all three models. Looks like models have some agreement even if they structure is very different.


If you want to learn more about DALEX package and decompositions for model predictions please consult following cheatsheet or the DALEX website.

If you want to learn more about explainers in general, join our DALEX Invasion!
Find our DALEX workshops at SER (Warsaw, April 2018), ERUM (Budapest, May 2018), WhyR (Wroclaw, June 2018) or UseR (Brisbane, July 2018).


To leave a comment for the author, please follow the link and comment on their blog: SmarterPoland.pl » English.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)