Short reminder: The modelStudio package creates an interactive serverless D3.js dashboard for exploration of predictive models, which is based on principles of Explanatory Model Analysis. With modelStudio interface, one can juxtapose instance-level model explanations (Break Down, Shapley values and Ceteris Paribus profiles), dataset-level model explanations (Partial Dependence Plots, Feature Importance, Accumulated Local Effects Plot) and data exploration plots (histogram and scatterplots). Examples below use a GBM model trained on the kaggle FIFA 19 dataset to predict player’s value based on selected 40 player’s characteristics. Play with the live demo here.
New plots for EDA
There are two new plots in modelStudio for exploratory data analysis: Target vs Feature and Average Target vs Feature (useful for classification problems). They are especially helpful when examining Partial Dependence profiles. Both show the relation between target and a selected feature, but first shows the raw relation in the data, while the latter shows a relationship learned by a model.
Some defaults changed in version v0.2.1 to improve the general usability.
If no new instances are provided for local explanations, then by default, a small sample is taken at random from the training data.
When modelStudio is plotted, by default the first panel is set to a Break Down plot while the second panel is clicked. This saves 3 clicks!
All plots for categorical variables now have the same order of levels.
Verbose and stable calculations
modelStudio is serverless, so all computations need to be done in advance, which may take some time. By default, the whole process is more verbose, shows a progress bar and information about the current calculations.
The try-catch blocks reassure that even if some parts fail, the rest will finish and plots will show up in the dashboard.
Feature importance plots now have boxplots that show how stable are calculations for individual variables.
Easy to create
The code chunk below creates a random forest model for the apartments dataset and then creates a modelStudio dashboard.
model <- randomForest(m2.price ~., data = apartments)
explainer <- explain(model, data = apartments[,-1], y = apartments[,1])
Find more information in the modelStudio GitHub repo.