How do I calculate the R squared metric for a Bayesian model?
A good friend I met on a field trip long ago, Dominique Roche, recently emailed me to ask about evaluation of Bayesian models.
He has been delving into generalized linear models, using Bayesian methods, and needed to decide what criteria he should use for model simplification (the process of removing ‘insignificant’ covariates) and/or deciding which covariates had the strongest effect on his response of interest.
I wanted to share an abridged version of our conversation, as I think it is enlightening. Specifically we discussed whether one can (and should) calculate the R-squared metric for Bayesian models (the short answer being that the R-squared doesn’t make much sense for Bayesian models).
The email chain
Dom: What criteria can I use to make inferences with my Bayesian model? I’ve used MCMCglmm (an R package) previously to run multivariate models. However, in addition to doing this in our study, we want to test the effect of multiple predictors (covariates) on our response variables. I’m curious to know if you recommend one in particular (parameter estimate and 95% CI, Bayes factor, ).
CB: You can test the covariates very much like you did for your correlated response model. Your covariate model will estimate a slope very much like a linear regression. So if you want a criteria for inferences, you can just use the 95% CI on that slope estimate. If it overlaps zero then you could say that the effect of that covariate is ‘insignificant’.
You can just then compare the median and 95% CI values for your covariates to see which ones have the largest effect.
Another option is to use the WAIC which some stats packages will calculate for you. It is very much like the AIC and can be used to select the most parsimonious model. i.e. your compare models with different sets of covariates and pick the one (or several) with the lowest WAIC.
One thing to watch out for is that your model may give funky results if the covariates are strongly correlated with each other. A good reference is the book about eco stats by Zuur that has penguins on the front cover. (One of my favourite reference books, I can never remember the title though).
Dom: Your answer confirms what I was thinking – that reporting an estimate and 95% CI is fine for hypothesis testing, which is great.
I’m familiar with model selection and AIC but I usually also report a pseudo R2 (usually using r.squaredGLMM) to indicate how much of the variance my model explains. There is a short paper on that topic which came out recently in JAP (which is worth reading). Do you know if there a way to compute a pseudo R2 for a multivariate Bayesian model?
CB: In short by asking about an R2 for a Bayesian model you are opening a can of worms.
Bayesians prefer WAIC or LOO (leave one out cross validation) for evaluating models because they integrate across the full posterior probability. The R2 is a point estimate, it is just an evaluation of the mean prediction of the model (so doesn’t account for the uncertainty in parameter estimates).
Point estimates don’t sit well with the Bayesian philosophy. A Bayesian assumes all parameters are random variables. Whereas, a frequentist would use the R-squared because they are assuming there are actually fixed true values we are trying to observe.
I do see the appeal of the R2 though, because it is absolute in the sense that 0.9 is 10% better than 0.8. WAIC is not like this.
But, a Bayesian isn’t too bothered by R2 because they spend more time looking at the posterior credibility or predictive intervals. You can predict your data using the fitted model and then just see how wide the CIs or PIs are. Wide CIs would be analogous to a poor R2.
Some model specifications might have narrower CIs than others (for instance if you include more informative covariates).
Dom: Also, from a quick look at the paper you sent me (the one about WAIC) and the vignette for the loo package, it seems like implementing model selection for a Bayesian model using WAIC is analogous to carrying out likelihood ratio tests to compare nested models.
How would one go about comparing models with four predictors (no interactions)? Would you compute the WAIC by hand for all possible models and retain the ‘best’ model(s)? I’ve used the dredge function in MuMln to automate the process in the past.
CB: The WAIC is just like the AIC, so it is a relative measure of model merit. You would jsut run all candidate models (e.g. with 4, 3, 2, 1 covariates) and compare them.
So far you will have to do this by hand, I’m not aware of anything that automates it like MuMln. Could be very time consuming, because bayes models tend to be slow to run.
Rather than just dredging for the best AIC, I prefer a hypothesis driven approach. So specify the subset of models that each addresses a specific hypotheses, then test and compare with WAIC to find which hypotheses have the greatest support.
Dom: RE issues of multi-collinearity – yep, I’m aware that this is something to check for. I’ve used vifs in the past for frequentist models. Is there something analogous for Bayesian models or does one simply look at correlations?
CB: Good question wrt multicollinearity. I’m not actually sure how to formally test for it, never had this probelm in any bayesian models I’ve run. Something to research. It will be an issue in Bayesian models just as it is in frequentist models, the only exception being that you could get around the issue with strongly informative priors, but that’s another story (look at this paper if you’re interested).
Going forward a Bayesian
Hit me up on Twitter if you want to join in on the conversation (or don’t agree with something I said).
Finally, if you are thinking of going Bayesian for your next GLM, here’s a short review of ways you can do it in R.