Fun with the proto package: building an MCMC sampler for Bayesian regression

[This article was first published on Last Resort Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The proto package is my latest favourite R goodie. It brings prototype-based programming to the R language – a style of programming that lets you do many of the things you can do with classes, but with a lot less up-front work. Louis Kates and Thomas Petzoldt provide an excellent introduction to using proto in the package vignette.

As a learning exercise I concocted the example below involving Bayesian logistic regression. It was inspired by an article on Matt Shotwell’s blog about using R environments rather than lists to store the state of a Markov Chain Monte Carlo sampler. Here I use proto to create a parent class-like object (or trait in proto-ese) to contain the regression functions and create child objects to hold both data and results for individual analyses.

First here’s an example session…

<br /># Make up some data with a continuous predictor and binary response<br />nrec <- 500<br />x <- rnorm(nrec)<br />y <- rbinom(nrec, 1, plogis(2 - 4*x))<br /><br /># Predictor matrix with a col of 1s for intercept<br />pred <- matrix(c(rep(1, nrec), x), ncol=2)<br />colnames(pred) <- c("intercept", "X")<br /><br /># Load the proto package<br />library(proto)<br /><br /># Use the Logistic parent object to create a child object which will <br /># hold the data and run the regression (the $ operator references <br /># functions and data within a proto object)<br />lr <- Logistic$new(pred, y)<br />lr$run(5000, 1000)<br /><br /># lr now contains both data and results<br />str(lr)<br /><br />proto object <br /> $ cov      : num [1:2, 1:2] 0.05 -0.0667 -0.0667 0.1621 <br />  ..- attr(*, "dimnames")=List of 2 <br /> $ prior.cov: num [1:2, 1:2] 100 0 0 100 <br /> $ prior.mu : num [1:2] 0 0 <br /> $ beta     : num [1:5000, 1:2] 2.09 2.09 2.09 2.21 2.21 ... <br />  ..- attr(*, "dimnames")=List of 2 <br /> $ adapt    : num 1000 <br /> $ y        : num [1:500] 0 1 1 1 1 1 1 1 1 1 ... <br /> $ x        : num [1:500, 1:2] 1 1 1 1 1 1 1 1 1 1 ... <br />  ..- attr(*, "dimnames")=List of 2 <br /> parent: proto object <br /><br /># Use the Logistic summary function to tabulate and plot results<br />lr$summary()<br /><br />From 5000 samples after 1000 iterations burning in<br />   intercept           X         <br /> Min.   :1.420   Min.   :-5.296  <br /> 1st Qu.:1.840   1st Qu.:-3.915  <br /> Median :2.000   Median :-3.668  <br /> Mean   :1.994   Mean   :-3.693  <br /> 3rd Qu.:2.128   3rd Qu.:-3.455  <br /> Max.   :2.744   Max.   :-2.437  <br />

And here’s the code for the Logistic trait…

<br />Logistic <- proto()<br /><br />Logistic$new <- function(., x, y) {<br />  # Creates a child object to hold data and results<br />  #<br />  # x - a design matrix (ie. predictors<br />  # y - a binary reponse vector<br /><br />  proto(., x=x, y=y)<br />}<br /><br />Logistic$run <- function(., niter, adapt=1000) {<br />  # Perform the regression by running the MCMC<br />  # sampler<br />  #<br />  # niter - number of iterations to sample<br />  # adapt - number of prior iterations to run<br />  #         for the 'burning in' period<br /><br />  require(mvtnorm)<br /><br />  # Set up variables used by the sampler<br />  .$adapt <- adapt<br />  total.iter <- niter + adapt  <br />  .$beta <- matrix(0, nrow=total.iter, ncol=ncol(.$x))<br />  .$prior.mu <- rep(0, ncol(.$x))<br />  .$prior.cov <- diag(100, ncol(.$x))<br />  .$cov <- diag(ncol(.$x))<br /><br />  # Run the sampler<br />  b <- rep(0, ncol(.$x))<br />  for (i in 1:total.iter) {<br />    b <- .$update(i, b)<br />    .$beta[i,] <- b<br />  }<br />  <br />  # Trim the results matrix to remove the burn-in<br />  # period<br />  if (.$adapt > 0) {<br />    .$beta <- .$beta[(.$adapt + 1):total.iter,]<br />  }<br />}<br /><br />Logistic$update <- function(., it, beta.old) {<br />  # Perform a single iteration of the MCMC sampler using<br />  # Metropolis-Hastings algorithm.<br />  # Adapted from code by Brian Neelon published at:<br />  # http://www.duke.edu/~neelo003/r/<br />  #<br />  # it -       iteration number<br />  # beta.old - vector of coefficient values from <br />  #            the previous iteration<br /><br />  # Update the coefficient covariance if we are far<br />  # enough through the sampling<br />  if (.$adapt > 0 & it > 2 * .$adapt) {<br />    .$cov <- cov(.$beta[(it - .$adapt):(it - 1),])<br />  }<br />  <br />  # generate proposed new coefficient values<br />  beta.new <- c(beta.old + rmvnorm(1, sigma=.$cov))<br />  <br />  # calculate prior and current probabilities and log-likelihood<br />  if (it == 1) {<br />    .$..log.prior.old <- dmvnorm(beta.old, .$prior.mu, .$prior.cov, log=TRUE)<br />    .$..probs.old <- plogis(.$x %*% beta.old)<br />    .$..LL.old <- sum(log(ifelse(.$y, .$..probs.old, 1 - .$..probs.old)))<br />  }<br />  log.prior.new <- dmvnorm(beta.new, .$prior.mu, .$prior.cov, log=TRUE)<br />  probs.new <- plogis(.$x %*% beta.new)  <br />  LL.new <- sum(log(ifelse(.$y, probs.new, 1-probs.new)))<br />  <br />  # Metropolis-Hastings acceptance ratio (log scale)<br />  ratio <- LL.new + log.prior.new - .$..LL.old - .$..log.prior.old<br />  <br />  if (log(runif(1)) < ratio) {<br />   .$..log.prior.old <- log.prior.new<br />   .$..probs.old <- probs.new<br />   .$..LL.old <- LL.new<br />    return(beta.new)<br />  } else {<br />    return(beta.old)<br />  }<br />}<br /><br />Logistic$summary <- function(., show.plot=TRUE) {<br />  # Summarize the results<br /><br />  cat("From", nrow(.$beta), "samples after", .$adapt, "iterations burning in\n")<br />  base::print(base::summary(.$beta))<br />  <br />  if (show.plot) {<br />    par(mfrow=c(1, ncol(.$beta)))<br />    for (i in 1:ncol(.$beta)) {<br />      plot(density(.$beta[,i]), main=colnames(.$beta)[i])<br />    }<br />  }<br />}<br />

Now that’s probably not the greatest design in the world, but it only took me a few minutes to put it together and it’s incredibly easy to modify or extend. Try it !

Thanks to Brian Neelon for making his MCMC logistic regression code available (http://www.duke.edu/~neelo003/r/).

To leave a comment for the author, please follow the link and comment on their blog: Last Resort Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)