ABC model choice not to be trusted

January 26, 2011
By

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers)

This may sound like a paradoxical title given my recent production in this area of ABC approximations, especially after the disputes with Alan Templeton, but I have come to the conclusion that ABC approximations to the Bayes factor are not to be trusted. When working one afternoon in Park City with Jean-Michel and Natesh Pillai (drinking tea in front of a fake log-fire!), we looked at the limiting behaviour of the Bayes factor constructed by an ABC algorithm, ie by approximating posterior probabilities for the models from the frequencies of acceptances of simulations from those models (assuming the use of a common summary statistic to define the distance to the observations). Rather obviously (a posteriori!), we ended up with the true Bayes factor based on the distributions of the summary statistics under both models!

At first, this does not sound a particularly novel and fundamental result, since all ABC approximations rely on the posterior distributions of those summary statistics, rather than on the whole dataset. However, while this approximation only has consequences in terms of the precision of the inference for most inferential purposes, it induces a dramatic arbitrariness in the Bayes factor. To illustrate this arbitrariness, consider the case of using a sufficient statistic S(x) for both models. Then, by the factorisation theorem, the true likelihoods factorise as

\ell_1(\theta_1|x) = g_1(x) p_1(\theta_1| S(x)) \quad\text{and}\quad \ell_2(\theta_2|x) = g_2(x) p_2(\theta_2| S(x))

resulting in a true Bayes factor equal to

B_{12}(x) = \dfrac{g_1(x)}{g_2(x)}\,B^S_{12}(x)

where the last term is the limiting ABC Bayes factor. Therefore, in the favourable case of the existence of a sufficient statistic, using only the sufficient statistic induces a difference in the result that fails to converge with the number of observations or simulations. On the opposite, it may diverge one way or another as the number of observations increases…  (This is the point in the above illustration, taken from the arXived paper, the true Bayes factor corresponding to the first axis and the ABC approximation to the second, based on 50 observations from either Poisson (left) or geometric (right).) Again, this is in the favourable case of sufficiency. In the realistic setting of using summary statistics, things deteriorate further! This practical situation indeed implies a wider loss of information compared with the exact inferential approach, hence a wider discrepancy between the exact Bayes factor and the quantity produced by an ABC approximation. It thus appears to us an urgent duty to warn the community about the dangers of this approximation, especially when considering the rapidly increasing number of applications using ABC for conducting model choice and hypothesis testing. Furthermore, we unfortunately do not see an immediate and generic alternative for the approximation of Bayes factor. The only solution seems to be using discrepancy measures as in Ratmann et al. (2009), ie (empirical) model criticism rather than (decision-theoretic) model choice.


Filed under: Mountains, R, Statistics, University life Tagged: ABC, adap'skiii, Alan Templeton, Bayes factor, Bayesian model choice, Bayesian model evaluation, Gibbs random field, sufficient statistics, summary statistics, tea

To leave a comment for the author, please follow the link and comment on his blog: Xi'an's Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , , , , , ,

Comments are closed.