expectation-propagation and ABC

Posted on August 23, 2011 by xi'an in R bloggers | 0 Comments

[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

“It seems quite absurd to reject an EP-based approach, if the only alternative is an ABC approach based on summary statistics, which introduces a bias which seems both larger (according to our numerical examples) and more arbitrary, in the sense that in real-world applications one has little intuition and even less mathematical guidance on to why p(θ|s(y)) should be close to p(θ|y) for a given set of summary statistics s.”

Simon Barthelmé and Nicolas Chopin posted a recent arXiv paper on Expectation-Propagation for Summary-Less, Likelihood-Free Inference. They sell expectation-propagation as quick and dirty version of ABC, avoiding the selection of summary statistics by using the constraint

$||y_i-y^\star_i||\le \epsilon$

on each component of the simulated pseudo-data vector y* being the actual data. Expectation-propagation is a variational technique [Simon and Nicolas are quite fond of!] and it consists in replacing the target with the “closest” member from an exponential family, like the Gaussian distribution. The expectation-propagation approximation is found by including a single “observation” at a time, using the other approximations for the prior, and finding the best Gaussian in this pseudo-model. In addition, expectation-propagation provides an approximation of the evidence. In the “likelihood-free” setting (I do not like this term because we are dealing with a specific well-defined likelihood, we simply cannot compute it!), this means computing empirical mean and empirical variance, one observation at a time, under the above tolerance constraint.

Unless I am confused, the expectation-propagation approximation to the posterior distribution is a [sequentially updated] Gaussian distribution, which means that it will only be appropriate in cases where the posterior distribution is approximately Gaussian. Since the three examples processed in the paper are of this kind, e.g. the above reproduction, I wonder at the performances of the expectation-propagation method in less smooth cases, such as ridge-like or multimodal posteriors. The authors mention two limitations: “First, it [EP] assumes a Gaussian prior; and second, it relies on a particular factorisation of the likelihood, which makes it possible to simulate sequentially the datapoints“, but those seem negligible wrt my above comment. I thus remain unconvinced by the concluding sentence quoted above. (The current approach to ABC is to consider p(θ|s(y)) as a target per se, not as an approximation to p(θ|y).) Nonetheless, expectation-propagation constitutes a quick approximation method that can always used as a reference against other approximations.