Rashomon effect and the probability of severe condition after Covid-19 infection
TL;DR: If you want to better understand the relationship between some dependent and target variable, you should build many different models (glm, boosting, rf) and compare their PD profiles (e.g. with DALEX).
The CRS-19 (Covid-19 Risk Score) model
Recently, the MOCOS group (MOdeling COronavirus Spread) developed second version of the Covid-19 model for severe condition after being infected with Covid-19. It was built on a sample of over 52 thousands of cases in Poland with a positive PCR test for Covid-19 disease (more about the data later). You can play with the model at https://crs19.pl/.
The main goal of this app is to show how particular features affect the risk. The effect of age is especially interesting and below I will discuss some interesting aspects of it.
The Rashomon effect describes a situation in which an event is given contradictory interpretations or descriptions by the individuals involved. The name comes after Akira Kurosawa’s 1950 film Rashomon [wikipedia].
In the area of predictive modelling, this term has been popularized by Leo Breiman’s in his work Statistical modeling: The Two Cultures. It refers to situations where multiple various models have similar predictive performance although they describe reality in different ways (so called multiplicity of good models).
Such situation is a challenge if we want to explain an effect of a variable in predictive model, because sometimes we have a few alternative explanations and we do not know which is better.
Partial Dependence Profile
If you want to see the relationship between a dependent variable and the expected model response, you can use the PD profiles (proposed by Friedman in 2000). These profiles are implemented in a number of packages (DALEX, pdp, iml, PDPbox, scikit-learn) and described in many places (see for example Explanatory Model Analysis online book).
Below we will use PD profiles to explain how different predictive models see the age effect on severe illness.
Perspective of random forest, gradient boosting and logistic regression
The figure on the left shows PD profiles for the three models: a random forest model (trained with randomForest), a gradient boosting model (trained with xgboost), and logistic regression.
All models have been built for the classification task — prediction of severe conditions after Covid-19 infection. The effectiveness of all these models is similar (AUC around 0.9). In each case we see that the risk of severe disease increases with age (for boosting model such monotonicity was forced).
But despite generally similar behaviour we see large differences for the oldest patients. The random forest model reduces the variance at the cost of the bias. For the oldest patients the predictions are much lower than for the logistic regression which is quite a rigid model.
Perspective of different gradient boosting models
Let’s look at several different boosting models with a different number of trees. The more trees the more variance and flexibility.
On the example from the left panel we can see that whether the model has 25 trees or 450, the dependence that the model learned is quite similar.
In this case, this is due to the forced monotonicity by which the model cannot fluctuate too much.
But how would it look like if the effect of age wasn’t forced to be monotonic? On the left we see three boosting models with different number of trees.
As expected, the more trees the greater the variance.
We see even random fluctuations around 18 and 38 years and a large variance among the oldest patients.
Perspective of neural networks, generalized additive models and logistic regression
The figure on the left shows PD profiles for the another three models. A neural network model with three layers, (trained with the neuralnet package), a generalized additive models (trained with the rms package), and logistic regression.
The model from the rms package uses tail restricted cubic splines. We can see that it behaves a little differently on the margins than a simple logistic regression.
Unlike tree based models (boosting, random forest), we see that models using linear activations have steep behaviour for older patients.
The figure on the left shows the more interesting models presented above.
Again, we see that the models are similar for most cases, but differ in behaviour for this small group of the oldest patients,
Take away message
To build the model presented in the crs19 app, we have tested hundreds of different models (in the above mentioned considerations we focused on age, but there are many other variables worth looking at). For most of the compared models we get similar performance measured by AUC or F1.
In such a situation there is no need to rely blindly on one performance measure. The models discussed above differed on a very small group of people over 90 years old. Performance calculated on the whole data set does not see these differences at all. It is just another example that performance may be very similar, while the underlying model may behave differently. And you can’t build a model responsibly if you don’t look at it. PD profiles are a good tool for visual examination of predictive models.
For more Covid-19 models follow the MOCOS webpage.
The surveillance data was obtained from NIZP-PZH on November 9th, 2020. Raw data has 51 variables for 55 950 cases collected between 21/Feb/2020 and 04/Nov/2020, but cases with very short observation time and with large numbers of missing data were removed leaving 52 580 cases that are used for modelling.
If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.
In order to see more R related content visit https://www.r-bloggers.com
Rashomon effect and the severe condition after Covid-19 infections was originally published in ResponsibleML on Medium, where people are continuing the conversation by highlighting and responding to this story.