MCMC for Econometrics Students – III

[This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As its title suggests, this post is the third in a sequence of posts designed to introduce econometrics students to the use of Markov Chain Monte Carlo (MCMC, or MC2) methods for Bayesian inference. The first two posts can be found here and here, and I’ll assume that you’ve read both of them already.

We’re going to look at another example involving the use of the Gibbs sampler. Specifically, we’re going to use it  to extract the marginal posterior distributions from the joint posterior distribution, in a simple two-parameter problem. The problem – which we’ll come to shortly – is one in which we actually know the answer in advance. That’s to say, the marginalizing can be done analytically with some not-too-difficult integration. This means that we have a “bench mark” against which to judge the results generated by the Gibbs sampler.

Let’s look at the inference problem we’re going to solve.

Suppose that we have a population that can be described by a Normal distribution with an unknown mean, μ, and an unknown precision parameter, τ. The latter parameter is just τ = 1/σ2, where σ is the standard deviation of the population. We want to estimate both μ and τ.

Before we see any sample data, we are a priori completely ignorant about the possible values of the parameters. So, we assign the following “diffuse” joint prior density:

                    p(μ , τ) = p(μ) p(τ) ∝ 1/τ       ;         -∞ < μ < ∞      ;       0 < τ < ∞

Then, we take a sample of n independent values from the population. So, the likelihood function is:

                   L(μ , τ | y) = p(y | μ , τ) ∝ τn/2 exp[-(τ / 2) ∑(yi – μ)2]  ,

where y is the vector of n sample values.

So, by Bayes’ Theorem, the joint posterior density for the two parameters is:

                  p(μ , τ | y) = [p(μ , τ) L(μ , τ |y) / p(y)] ∝ p(μ , τ) L(μ , τ | y)

                                  ∝ τn/2 – 1 exp[-(τ / 2) ∑(yi – μ)2]            .                                       (1)

What we’re interested in obtaining are the marginal posterior densities for each of the two parameters.

Now recall that the basic inputs for the Gibbs sampler are the conditional posterior distributions for each of the (groups of) parameters. If we look at equation (1) and treat τ as if it’s a constant – i.e., if we condition on τ – we can see that

                p(μ | τ , y) ∝ exp[-(τ / 2) ∑(yi – μ)2]

                                 ∝ exp[-(τ / 2) ∑{vs2 + (μ – y*)2}]
                                 ∝ exp[-(nτ / 2) (μ – y*)2]  ,

where y* is the sample arithmetic mean for the y data; v = (n – 1); and s2 is the sample variance.

That is, the conditional posterior for μ is a Normal distribution, with a mean of y* and a variance of (nτ)-1.

Conversely, if we look at equation (1) and treat μ as if it’s known – i.e., if we condition on μ – we can see that

               p(τ | μ , y) ∝ τn/2 – 1 exp[-τ {0.5∑(yi – μ)2}]  .

So, the conditional posterior for τ is a Gamma distribution, with a shape parameter,  r = (n / 2) and a scale parameter,  λ = [0.5 ∑(yi – μ)2] .

We now have the information that we need to implement the Gibbs sampler.

Here are the steps we’ll follow:
  1. Provide an initial value for τ. Call this τ(0).
  2. Draw a random value from p(μ | τ(0) , y). Call this value μ(1).
  3. Then draw a random value from p(τ | μ(1) , y). Call this value τ(1).
  4. Draw a random value from p(μ | τ(1) , y). Call this value μ(2).
  5. Repeat steps 3 and 4 many, many times.
After a “burn in” period, the values of μ and τ will begin to take values that are consistent with being drawn from the marginal posterior distributions for these parameters, rather than from their conditional posterior distributions. The strings of values for μ and τ associated with the “burn in” will be discarded. 

The thousands of subsequent values of these parameters that we’re going to generate will provide us with information about the marginal posterior distributions for μ and τ.

Now, as mentioned above, for this particular inference problem we actually know the forms of the marginal posterior distributions for μ and τ. Specifically, that for for μ is Student-t, and that for τ is Gamma. We’ll come back to this point below, when we check the accuracy of out Gibbs sampler results.

Let’s take a look at the R script that I’ve written to implement the Gibbs sampler for this problem. It’s available on the code page for this blog. As in the previous post on MCMC, the R code is pretty basic. It’s written with a view to transparency so that non-users can see what’s going on. It could certainly be streamlined a lot, but that’s not my concern right now.

So here we go. First, let’s initialize some quantities, and draw sample of n = 10 observations from a Normal population:

Now let’s go through the loop for the Gibbs sampler itself:

Now we’ll drop the first 5,000 values of μ and τ for the “burn in”:

Let’s see what our results look like. Do we have marginal posteriors for μ and τ that are Normal and Gamma? We’re going to take a look at some skewness and kurtosis measures as part of the basis for assessing our results, so the “moments” package for R has to have been installed, and we’ll access is with a “library” command:

Here are the results at this point. First, we have some of the characteristics of what should be the marginal posterior for μ:

We can see that the posterior mean for μ lines up very nicely at 1.075. The skewness is essentially zero; and the kurtosis, which should be 4.2, is simulated to be 4.01. That’s pretty good!

Turning to the marginal posterior for tau, we have:

The skewness for Gamma marginal posterior for τ should be 0.894, and we have a value of 0.95. Not bad. In addition, the kurtosis should be 4.2, and we have 4.35.

Also, notice that that the sample data were drawn from a population with mu = tau = 1. If we had a quadratic loss function, our Bayes estimators of these two parameters would be the marginal posterior means: 1.08 and 1.10 respectively. If we had an absolute error loss function, then the Bayes estimators would be the marginal posterior medians: 1.08  and 1.02 respectively.

And we achieved theses results with just n = 10 sample observations.

Finally, let’s take a look at the plots for the two marginal posterior densities:

So, the results for the marginal posterior distributions for mu and tau, obtained using the Gibbs sampler, look pretty convincing,

In the next post in this sequence we’ll look at a Bayesian inference problem where we can’t obtain the results analytically, and the use of the Gibbs sampler is essential to obtaining a solution.

© 2014, David E. Giles

To leave a comment for the author, please follow the link and comment on their blog: Econometrics Beat: Dave Giles' Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)