Let’s look at the inference problem we’re going to solve.
Suppose that we have a population that can be described by a Normal distribution with an unknown mean, μ, and an unknown precision parameter, τ. The latter parameter is just τ = 1/σ2, where σ is the standard deviation of the population. We want to estimate both μ and τ.
Before we see any sample data, we are a priori completely ignorant about the possible values of the parameters. So, we assign the following “diffuse” joint prior density:
Then, we take a sample of n independent values from the population. So, the likelihood function is:
L(μ , τ | y) = p(y | μ , τ) ∝ τn/2 exp[-(τ / 2) ∑(yi – μ)2] ,
where y is the vector of n sample values.
So, by Bayes’ Theorem, the joint posterior density for the two parameters is:
p(μ , τ | y) = [p(μ , τ) L(μ , τ |y) / p(y)] ∝ p(μ , τ) L(μ , τ | y)
∝ τn/2 – 1 exp[-(τ / 2) ∑(yi – μ)2] . (1)
p(μ | τ , y) ∝ exp[-(τ / 2) ∑(yi – μ)2]
∝ exp[-(τ / 2) ∑{vs2 + (μ – y*)2}]
∝ exp[-(nτ / 2) (μ – y*)2] ,
where y* is the sample arithmetic mean for the y data; v = (n – 1); and s2 is the sample variance.
That is, the conditional posterior for μ is a Normal distribution, with a mean of y* and a variance of (nτ)-1.
Conversely, if we look at equation (1) and treat μ as if it’s known – i.e., if we condition on μ – we can see that
p(τ | μ , y) ∝ τn/2 – 1 exp[-τ {0.5∑(yi – μ)2}] .
- Provide an initial value for τ. Call this τ(0).
- Draw a random value from p(μ | τ(0) , y). Call this value μ(1).
- Then draw a random value from p(τ | μ(1) , y). Call this value τ(1).
- Draw a random value from p(μ | τ(1) , y). Call this value μ(2).
- Repeat steps 3 and 4 many, many times.
Now, as mentioned above, for this particular inference problem we actually know the forms of the marginal posterior distributions for μ and τ. Specifically, that for for μ is Student-t, and that for τ is Gamma. We’ll come back to this point below, when we check the accuracy of out Gibbs sampler results.
Let’s take a look at the R script that I’ve written to implement the Gibbs sampler for this problem. It’s available on the code page for this blog. As in the previous post on MCMC, the R code is pretty basic. It’s written with a view to transparency so that non-users can see what’s going on. It could certainly be streamlined a lot, but that’s not my concern right now.
So here we go. First, let’s initialize some quantities, and draw sample of n = 10 observations from a Normal population:
Now let’s go through the loop for the Gibbs sampler itself:
Now we’ll drop the first 5,000 values of μ and τ for the “burn in”:
Let’s see what our results look like. Do we have marginal posteriors for μ and τ that are Normal and Gamma? We’re going to take a look at some skewness and kurtosis measures as part of the basis for assessing our results, so the “moments” package for R has to have been installed, and we’ll access is with a “library” command:
Here are the results at this point. First, we have some of the characteristics of what should be the marginal posterior for μ:
We can see that the posterior mean for μ lines up very nicely at 1.075. The skewness is essentially zero; and the kurtosis, which should be 4.2, is simulated to be 4.01. That’s pretty good!
Turning to the marginal posterior for tau, we have:
The skewness for Gamma marginal posterior for τ should be 0.894, and we have a value of 0.95. Not bad. In addition, the kurtosis should be 4.2, and we have 4.35.
Also, notice that that the sample data were drawn from a population with mu = tau = 1. If we had a quadratic loss function, our Bayes estimators of these two parameters would be the marginal posterior means: 1.08 and 1.10 respectively. If we had an absolute error loss function, then the Bayes estimators would be the marginal posterior medians: 1.08 and 1.02 respectively.
And we achieved theses results with just n = 10 sample observations.
Finally, let’s take a look at the plots for the two marginal posterior densities:
So, the results for the marginal posterior distributions for mu and tau, obtained using the Gibbs sampler, look pretty convincing,
In the next post in this sequence we’ll look at a Bayesian inference problem where we can’t obtain the results analytically, and the use of the Gibbs sampler is essential to obtaining a solution.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...