Easy Laplace Approximation of Bayesian Models in R

Posted on November 21, 2013 by Rasmus Bååth in R bloggers | 0 Comments

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Thank you for tuning in! In this post, a continuation of Three Ways to Run Bayesian Models in R, I will:

Handwave an explanation of the Laplace Approximation, a fast and (hopefully not too) dirty method to approximate the posterior of a Bayesian model.
Show that it is super easy to do Laplace approximation in R, basically four lines of code.
Put together a small function that makes it even easier, if you just want this, scroll down to the bottom of the post.

But first a picture of the man himself:

Laplace

Laplace Approximation of Posterior Distributions

Have you noticed that posterior distributions often are heap shaped and symmetric? What other thing, that statisticians love, is heap shaped and symmetric? The normal distribution of course! Turns out that, under most conditions, the posterior is asymptotically normally distributed as the number of data points goes to $\infty$. Hopefully we would then not be too off if we approximated the posterior using a (possibly multivariate) normal distribution. Laplace approximation is a method that does exactly this by first locating the mode of the posterior, taking this as the mean of the normal approximation, and then calculating the variance of the normal by “looking at” the curvature of of the posterior at the mode. That was the handwaving, if you want math, check wikipedia.

So, how well does Laplace Approximation work in practice? For complex models resulting in multimodal posteriors it will obviously not be a good approximation, but for simple models like the generalized linear model, especially when assuming the usual suspects, it works pretty well. But why use an approximation that might be slightly misleading instead of using Markov chain Monte Carlo, an approximation that converges to the true posterior? Well, because Laplace Approximation only has to find the mode of the posterior, it does not have to explore the whole posterior distribution and therefore it can be really fast! I’m talking seconds, not minutes…

Let’s see how well Laplace approximation performs in a couple of simple examples. In the following the the true posterior will be in green and the Laplace approximation will be dashed and red. First up is the posterior for the rate parameter $\theta$ of a Binomial distribution given 10 heads and 8 tails, assuming a flat prior on $\theta$.

Here the Laplace approximation works pretty well! Now let’s look at the posterior of the same model but with less data, only 4 heads and 2 tails.

Not working well at all. As the true posterior is slanted to the right the symmetric normal distribution can’t possibly match it. This type of problem generally occurs when you have parameters with boundaries. This was the case with $\theta$ which is bounded between $[0,1]$ and similarly we should expect troubles when approximating the posterior of scale parameters bounded between $[0,\infty]$. Below is the posterior of the standard deviation $\sigma$ when assuming a normal distribution with a fixed $\mu=10$ and flat priors, given 8 random numbers distributed as Normal(10, 4):