This is the fourth in a sequence of posts designed to introduce econometrics students to the use of Markov Chain Monte Carlo (MCMC, or MC2
) simulation methods for Bayesian inference. The first three posts can be found here
, and here
, and I’ll assume that you’ve read them already. The emphasis throughout is on the use of the Gibbs sampler.
The first three posts took a look “inside the box”, to see that the Gibbs sampler entails. I looked at some R code that could be used to show the sampler “in action”. One way to think about those posts is that they were analogous to an explanation of OLS regression, with code that assembled the X matrix, the (X’X) matrix, showed you how to invert the latter matrix, and so on. It’s important to understand what is going on when you select “OLS” in your favourite package, but you certainly wouldn’t dream of constructing the estimator from first principles every time you wanted to use it.
It’s the same in the case of our Bayesian analysis. So, the purpose of this post is to show you simple it is, in practice, to use R to estimate regression models using Bayesian methods, and to implement Bayesian Posterior Odds analysis for model selection.
Before we begin, you might want to look back at two other Bayesian econometrics posts that are somewhat relevant. This one here
showed how to use Bayesian inference to estimate a simple consumption function. Some numerical integration was needed to normalize the posterior density for the marginal propensity to consume parameter, and to obtain the moments of the posterior distribution. That was fine for that very simple model, but numerical integration gets prohibitively costly (from a computational perspective) very quickly as the number of parameters increases. That’s one of the most important reasons for using MCMC methods.
This other post
looked at the Bayesian approach to comparing and choosing between competing models. Bayesian Posterior Odds analysis was illustrated with some simple “demand for beer” models, a normal likelihood, and the natural conjugate prior density for the parameters. That was fine – the use of the natural conjugate prior meant that we were able to get closed-form analytic expressions for both the Bayes estimators of the parameters, and for the BPO. However, natural conjugate priors are very restrictive, and in practice they may not really capture our prior beliefs.
What I’m going to do here is to look again at some competing “demand for beer” models, but a different type of prior will be used for the parameters. This will necessitate the use of the Gibbs sampler to get the marginal posterior distributions of the parameters in each model. At the same time, we’ll do the BPO analysis to see what can be said about model selection and model averaging.
There are several packages in R that we use to do this. I’m going to use the package MCMCpack
to illustrate how simple it is to undertake this type of Bayesian analysis. Specifically, we’re going to undertake Bayesian estimation of some regression models of the following form:
. y = Xβ + ε ; ε ~ N[0 , σ2In] .
The joint prior for the parameters is
p(β , σ) = p(β) p(σ),
where the marginal prior for β is Normal, and that for σ-2 is Gamma. Note that this is not the natural conjugate prior. The latter would have a conditional (not independent) marginal Normal prior for β. So, we can’t get the marginal posteriors for beta and sigma analytically – hence the use of MCMC
The data that I’m going to work with are for the prices (PB, PW, PS, and PM) and quantities (QB, QW, QS, and QM) of beer, wine, spirits, and marijuana. The sample is very small (n = 9), and the data are available on the Data page
for this blog.
Having first installed the MCMCpack, here is the initial R code I used to read the data and take the logarithms of all of the variables, and estimate a first model::
As you can see, this model has the logarithm of the quantity of beer as the dependent variable, and the (logarithms of) the prices of beer and wine as the regressors, so the estimated coefficients will be price elasticities of demand. In the MCMCregress command that’s used to estimate this first model, the prior mean vector for the regression coefficients for the intercept, PB and PW is assigned in “b0”. The prior variances and covariances for the regression coefficients all assigned to be “1”, using “B0”, and the values for the prior mean and variance of sigma are also assigned values. You can change these to suit your prior beliefs, of course, and “B0” can also be assigned in matrix form to allow for different prior variances and covariances for the coefficients.
I’ve asked for 100,000 MC replications after a burn-in run of 5,000 replications. The final option I’ve included, specifying that the marginal likelihood (i.e., p(y)) is to be obtained using a Laplace approximation, is not needed for the actual estimation of the model. It’s included here only because there’s going to be some BPO analysis later, and we need it for that.
The “plot(model1)” and “summary(model1)” commands produce the following results, including the marginal posterior densities:
So, under a quadratic loss function, the Bayes estimate of the own-price elasticity of demand for beer is -0.59, while the Bayes estimate for the cross-price elasticity of demand (with respect to wine) is 2.18. You can compare these with the assigned prior means of -1 and +1 respectively.
Next, I’ve looked at an additional three competing models, each with the same dependent variable, but with different regressors:
As a matter of interest, here’s the summary of the results for model2:
For this model, the estimated own-price elasticity is -0.73, and the cross-price elasticity with respect to marijuana is 0.92.
Finally, we can compute and display the Bayes factors for all pairs of these four models. These will be the BPO if we have a uniform prior on the model space. We can also obtain the posterior probabilities for each model, assuming that the model space has been fully specified:
which results in the following information:
The first model has the highest posterior probability, but we might also want to use these posterior probabilities to form weighted averages of the various posterior means of the price elasticities. That is, we could now use Bayesian model averaging to get point estimates of the elasticities. In this example, not every price appears in every model, so the weights would be formed from a subset of the posterior probabilities, re-normalized to sum to one.
We could also use the posterior probabilities for the four models to combine forecasts. For example, suppose that we set the price variables to their sample mean values: log(PB) =1.278, log(PW) = 2.036, log(PS) = 3.752, and log(PM) = 6.194. The four individual forecasts of log(QB) from the four models are 4.803, 4.822, 4.818 and 4.819 respectively. Weighting these by the posterior probabilities for the models we get a combined forecast for log(QB) of 4.814.
The full R code for this analysis is on the Code page
for this blog, and I think you’ll agree that it was pretty simple to estimate our four competing models and compute the Bayes factors.
There will be further posts on this material in the near future.
© 2014, David E. Giles