**Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers)

^{2}) methods for Bayesian inference. The first two posts can be found

**here**and

**here**, and I’ll assume that you’ve read both of them already.

*marginal posterior*distributions from the

*joint posterior distribution*, in a simple two-parameter problem. The problem – which we’ll come to shortly – is one in which we actually know the answer in advance. That’s to say, the marginalizing can be done analytically with some not-too-difficult integration. This means that we have a “bench mark” against which to judge the results generated by the Gibbs sampler.

Let’s look at the inference problem we’re going to solve.

Suppose that we have a population that can be described by a Normal distribution with an unknown mean, μ, and an unknown precision parameter, τ. The latter parameter is just τ = 1/σ^{2}, where σ is the standard deviation of the population. We want to estimate both μ and τ.

Before we see any sample data, we are *a priori* completely ignorant about the possible values of the parameters. So, we assign the following “diffuse” joint prior density:

Then, we take a sample of n independent values from the population. So, the likelihood function is:

L(μ , τ | y) = p(y | μ , τ) ∝ τ^{n/2} exp[-(τ / 2) ∑(y_{i} – μ)^{2}] ,

where y is the vector of n sample values.

So, by Bayes’ Theorem, the* joint* posterior density for the two parameters is:

p(μ , τ | y) = [p(μ , τ) L(μ , τ |y) / p(y)] ∝ p(μ , τ) L(μ , τ | y)

∝ τ^{n/2 – 1} exp[-(τ / 2) ∑(y_{i} – μ)^{2}] . (1)

*marginal*posterior densities for each of the two parameters.

*conditional*posterior distributions for each of the (groups of) parameters. If we look at equation (1) and treat τ as if it’s a constant –

*i.e*., if we condition on τ – we can see that

p(μ | τ , y) ∝ exp[-(τ / 2) ∑(y_{i} – μ)^{2}]

∝ exp[-(τ / 2) ∑{vs^{2} + (μ – y*)^{2}}]

∝ exp[-(nτ / 2) (μ – y*)^{2}] ,

where y* is the sample arithmetic mean for the y data; v = (n – 1); and s^{2} is the sample variance.

That is, the conditional posterior for μ is a Normal distribution, with a mean of y* and a variance of (nτ)^{-1.}

Conversely, if we look at equation (1) and treat μ as if it’s known – *i.e*., if we condition on μ – we can see that

p(τ | μ , y) ∝ τ^{n/2 – 1} exp[-τ {0.5∑(y_{i} – μ)^{2}}] .

_{i}– μ)

^{2}] .

- Provide an initial value for τ. Call this τ
^{(0)}. - Draw a random value from p(μ | τ
^{(0)}, y). Call this value μ^{(1)}. - Then draw a random value from p(τ | μ
^{(1)}, y). Call this value τ^{(1)}. - Draw a random value from p(μ | τ
^{(1)}, y). Call this value μ^{(2)}. - Repeat steps 3 and 4 many, many times.

*marginal*posterior distributions for these parameters, rather than from their

*conditional*posterior distributions. The strings of values for μ and τ associated with the “burn in” will be discarded.

Now, as mentioned above, for this particular inference problem we actually know the forms of the marginal posterior distributions for μ and τ. Specifically, that for for μ is Student-t, and that for τ is Gamma. We’ll come back to this point below, when we check the accuracy of out Gibbs sampler results.

Let’s take a look at the R script that I’ve written to implement the Gibbs sampler for this problem. It’s available on the **code** page for this blog. As in the previous post on MCMC, the R code is pretty basic. It’s written with a view to transparency so that non-users can see what’s going on. It could certainly be streamlined a lot, but that’s not my concern right now.

So here we go. First, let’s initialize some quantities, and draw sample of n = 10 observations from a Normal population:

Now let’s go through the loop for the Gibbs sampler itself:

Now we’ll drop the first 5,000 values of μ and τ for the “burn in”:

Let’s see what our results look like. Do we have marginal posteriors for μ and τ that are Normal and Gamma? We’re going to take a look at some skewness and kurtosis measures as part of the basis for assessing our results, so the “moments” package for R has to have been installed, and we’ll access is with a “library” command:

Here are the results at this point. First, we have some of the characteristics of what should be the marginal posterior for μ:

We can see that the posterior mean for μ lines up very nicely at 1.075. The skewness is essentially zero; and the kurtosis, which should be 4.2, is simulated to be 4.01. That’s pretty good!

Turning to the marginal posterior for tau, we have:

The skewness for Gamma marginal posterior for τ should be 0.894, and we have a value of 0.95. Not bad. In addition, the kurtosis should be 4.2, and we have 4.35.

Also, notice that that the sample data were drawn from a population with mu = tau = 1. If we had a quadratic loss function, our Bayes estimators of these two parameters would be the marginal posterior means: 1.08 and 1.10 respectively. If we had an absolute error loss function, then the Bayes estimators would be the marginal posterior medians: 1.08 and 1.02 respectively.

And we achieved theses results with just n = 10 sample observations.

Finally, let’s take a look at the plots for the two marginal posterior densities:

So, the results for the marginal posterior distributions for mu and tau, obtained using the Gibbs sampler, look pretty convincing,

In the next post in this sequence we’ll look at a Bayesian inference problem where we can’t obtain the results analytically, and the use of the Gibbs sampler is essential to obtaining a solution.

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometrics Beat: Dave Giles' Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...