Computer Vision for Model Assessment

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the differences between statistical data scientists and machine learning engineers is that while the latter group are concerned primarily with the predictive performance of a model, the former group are also concerned with the fit of the model. A model that misses important structures in the data — for example, seasonal trends, or a poor fit to specific subgroups — is likely to be lacking important variables or features in the source data. You can try different machine learning techniques or adjust hyperparameters to your heart's content, but you're unlikely to discover problems like this without evaluating the model fit.

One of the most powerful tools for assessing model fit is the residual plot: a scatterplot of the predicted values of the model versus the residuals (the difference between the predictions and the original data). If the model fits well, there should be no obvious relation between the two. For example, visual inspection shows us that the model below first fairly well for 19 of the 20 subgroups in the data, but there may be a missing variable that would explain the apparent residual trend for group 19.


Assessing residual plots like these is something of an art, which is probably why it isn't routine in machine learning circles. But what if we could use deep learning and computer vision to assess residual plots like these automatically? That's what Professor Di Cook from Monash University proposes, in the 2018 Belz Lecture for the Statistical Society of Australia, Human vs computer: when visualising data, who wins?  

(The slides for the presentation, created using R, are also available online.) The first part of the talk also includes a statistical interpretation of neural networks, and an intuitive explanation of deep learning for computer vision. The talk also references a related  paper, Visualizing Statistical Models : Removing the Blindfold (by Hadley Wickham, Dianne Cook and Jeike Hofman and published in the ASA Data Science Journal in 2015) which is well worth a read as well.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)