#explainCovid19 challenge

[This article was first published on Stories by Przemyslaw Biecek on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For some time, I’ve been interested in the verification of software libraries for eXplainable Artificial Intelligence (XAI). Not only in terms of the number of implemented algorithms but also actual usability for the end user. I found that it is challenging to gather useful feedback from end-users because it is not easy to find a predictive model that wide group of users really cares about. One positive example is the FICO challenge. The task was to build and explain a predictive model for risk scoring.

Maybe it is time for a more important challenge?

Partial dependence plot for age in gradient boosting model that predicts survival of persons with COVID19 disease. Note that this plot is related to a simple predictive model build on small available data. It is not by any means final nor very precise.

Outbreak of COVID19 disease cased by SARS-CoV-2 is severe. Various data related to this outbreak is shared publicly. Selected individual data for infected persons (country, age, gender, date of infection, possible recovery or death) can be downloaded from this spreadsheet or this Kaggle data or for selected countries from other databases. This data makes possible training a predictive model for survival and also trying different XAI methods that can explain model predictions.

Using the DALEX I built a simple baseline solution. I trained a simple gradient boosting model that estimates odds of recovery based on gender, country and age (with forced monotonicity constraints). Then the model is explained with modelStudio interactive dashboard. Take a look and play with the model yourself https://pbiecek.github.io/explainCOVID19/

Break down plot for 50-years old male from China that has COVID19 disease. Chances of survival are pretty high (0.971), mostly due to moderate age. Note that this plot is related to a simple predictive model build on small available data. It is not by any means final nor very precise.

Usually XAI tools highlight any imperfections in the model or in the training data. This is the case here. Building a more complex model was difficult because of the incompleteness of the data on individual level. But even a simple model with three variables can be an interesting tool for a fresh look at the problem of model explainability.

If you have a better data, better model or better explanations please let me know. #explainCovid19

To leave a comment for the author, please follow the link and comment on their blog: Stories by Przemyslaw Biecek on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)