Explanatory Model Analysis with modelStudio

[This article was first published on Stories by Przemyslaw Biecek on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Version 0.2.1 of the modelStudio package reached CRAN this week. Let me shortly overview coolest new features (list of all changes).

Short reminder: The modelStudio package creates an interactive serverless D3.js dashboard for exploration of predictive models, which is based on principles of Explanatory Model Analysis. With modelStudio interface, one can juxtapose instance-level model explanations (Break Down, Shapley values and Ceteris Paribus profiles), dataset-level model explanations (Partial Dependence Plots, Feature Importance, Accumulated Local Effects Plot) and data exploration plots (histogram and scatterplots). Examples below use a GBM model trained on the kaggle FIFA 19 dataset to predict player’s value based on selected 40 player’s characteristics. Play with the live demo here.

New plots for EDA

There are two new plots in modelStudio for exploratory data analysis: Target vs Feature and Average Target vs Feature (useful for classification problems). They are especially helpful when examining Partial Dependence profiles. Both show the relation between target and a selected feature, but first shows the raw relation in the data, while the latter shows a relationship learned by a model.

Better defaults

Some defaults changed in version v0.2.1 to improve the general usability.

If no new instances are provided for local explanations, then by default, a small sample is taken at random from the training data.

When modelStudio is plotted, by default the first panel is set to a Break Down plot while the second panel is clicked. This saves 3 clicks!

All plots for categorical variables now have the same order of levels.

Verbose and stable calculations

modelStudio is serverless, so all computations need to be done in advance, which may take some time. By default, the whole process is more verbose, shows a progress bar and information about the current calculations.

The try-catch blocks reassure that even if some parts fail, the rest will finish and plots will show up in the dashboard.

Feature importance plots now have boxplots that show how stable are calculations for individual variables.

Easy to create

The code chunk below creates a random forest model for the apartments dataset and then creates a modelStudio dashboard.

library(DALEX)
library(randomForest)
model <- randomForest(m2.price ~., data = apartments)
explainer <- explain(model, data = apartments[,-1], y = apartments[,1])
library(modelStudio)
modelStudio(explainer)

Learn more

Find more information in the modelStudio GitHub repo.

To leave a comment for the author, please follow the link and comment on their blog: Stories by Przemyslaw Biecek on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)