Introduction to model exploration with code examples for R and Python
Welcome to “BASIC XAI with DALEX” series.
In this post, we will take a closer look at some algorithms used in explainable artificial intelligence. You will find here an introduction to methods of global and local model evaluation. Each description will include a technical introduction, example analysis, and code in R and Python.
So, shall we start?
First — why should I use XAI?
Nowadays, the quick and dirty approach to develop a predictive model is to try a large number of different ML algorithms and choose the single result that maximizes some validation criteria. This often results in complex models called black boxes. Why? Sometimes these elastic algorithms find models with greater predictive power, sometimes they can detect tricky relationships between variables, and sometimes all models are of similar performance but there are more complex ones so they are more often selected.
But there is a price to pay in this quick and dirty scheme. When we choose complex yet elastic models, we often lose the interpretability of them. To understand what decisions are made by the trained model, algorithms and tools are being developed to help human experts to understand how models are working. There is plenty of methods developed under the explainable artificial intelligence (XAI) umbrella that can be used to explain or explore complex models.
Second — which to choose: global vs local?
A growing number of tools for explanation are emerging because different stakeholders have different needs.
Global explanations are those that describe model behavior on the whole data set. This allows us to deduce how the model behaves generally/ usually/ on average.
Local explanations, on the other hand, refer to a single prediction, to a specific client/property/patient on which model operates. Usually, local explanations show which and how different variables contribute to the model prediction.
These differences are shown in the XAI pyramid below. The left part of the pyramid corresponds to the assessment of a single observation and the right part to the whole model. We can ask various questions about the model. On the left are questions related to a specific prediction. On the right are questions about the model in general.
From the top, we start with more general questions that can be answered with a single number or few numbers, like what is the predictive performance of the model (this can be summarised with a single number like AUC or RMSE), or a prediction value for a single observation (a single number). The following levels refer to the more and more specific methods, which we will discuss in this basic XAI series.
Third — let’s get a model in R and Python
In this example, we will use the apartments dataset (collected in Warsaw, available in DALEX package in R and Python). The data set describes 1000 apartments with six variables such as surface, floor, no.rooms, construction.year, m2.price, and district. We will create a model that predicts the price of an apartment, so let’s start with a black box regression model — random forest. The package that we will use in these examples is DALEX.
Below we have the code in Python and R, which allows us to transform the data, build a model, and explainer. The explainer is an object/adapter that wraps the model and creates a uniform structure and interface for operations.
If you want it, you can use ready-made objects prepared by us, you can find here.
In the next part, we will learn about a method for global variable importance — Permutational Variable Importance.
If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.
In order to see more R related content visit https://www.r-bloggers.com