Interpreting machine learning models with the lime package for R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Many types of machine learning classifiers, not least commonly-used techniques like ensemble models and neural networks, are notoriously difficult to interpret. If the model produces a surprising label for any given case, it's difficult to answer the question, “why that label, and not one of the others?”.
One approach to this dilemma is the technique known as LIME (Local Interpretable Model-Agnostic Explanations). The basic idea is that while for highly non-linear models it's impossible to give a simple explanation of the relationship between any one variable and the predicted classes at a global level, it might be possible to asses which variables are most influential on the classification at a local level, near the neighborhood of a particular data point. An procedure for doing so is described in this 2016 paper by Ribeiro et al, and implemented in the R package lime by Thomas Lin Pedersen and Michael Benesty (and a port of the Python package of the same name).
You can read about how the lime package works in the introductory vignette Understanding Lime, but this limerick by Mara Averick sums also things up nicely:
There once was a package called lime,
Whose models were simply sublime,
It gave explanations for their variations,
One observation at a time.
“One observation at a time” is the key there: given a prediction (or a collection of predictions) it will determine the variables that most support (or contradict) the predicted classification.
The lime package also works with text data: for example, you may have a model that classifies a paragraph of text as a sentiment “negative”, “neutral” or “positive”. In that case, lime will determine the the words in that sentence which are most important to determining (or contradicting) the classification. The package helpfully also provides a shiny app making it easy to test out different sentences and see the local effect of the model.
To learn more about the lime algorithm and how to use the associated R package, a great place to get started is the tutorial Visualizing ML Models with LIME from the University of Cincinnati Business Analytics R Programming Guide. The lime package is available on CRAN now, and you can always find the latest version at the GitHub repository linked below.
GitHub (thomasp): lime (Local Interpretable Model-Agnostic Explanations)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.