# Fun with the proto package: building an MCMC sampler for Bayesian regression

August 12, 2010
By

(This article was first published on Last Resort Software, and kindly contributed to R-bloggers)

The proto package is my latest favourite R goodie. It brings prototype-based programming to the R language – a style of programming that lets you do many of the things you can do with classes, but with a lot less up-front work. Louis Kates and Thomas Petzoldt provide an excellent introduction to using proto in the package vignette.

As a learning exercise I concocted the example below involving Bayesian logistic regression. It was inspired by an article on Matt Shotwell’s blog about using R environments rather than lists to store the state of a Markov Chain Monte Carlo sampler. Here I use proto to create a parent class-like object (or trait in proto-ese) to contain the regression functions and create child objects to hold both data and results for individual analyses.

First here’s an example session…

``# Make up some data with a continuous predictor and binary responsenrec <- 500x <- rnorm(nrec)y <- rbinom(nrec, 1, plogis(2 - 4*x))# Predictor matrix with a col of 1s for interceptpred <- matrix(c(rep(1, nrec), x), ncol=2)colnames(pred) <- c("intercept", "X")# Load the proto packagelibrary(proto)# Use the Logistic parent object to create a child object which will # hold the data and run the regression (the \$ operator references # functions and data within a proto object)lr <- Logistic\$new(pred, y)lr\$run(5000, 1000)# lr now contains both data and resultsstr(lr)proto object  \$ cov      : num [1:2, 1:2] 0.05 -0.0667 -0.0667 0.1621   ..- attr(*, "dimnames")=List of 2  \$ prior.cov: num [1:2, 1:2] 100 0 0 100  \$ prior.mu : num [1:2] 0 0  \$ beta     : num [1:5000, 1:2] 2.09 2.09 2.09 2.21 2.21 ...   ..- attr(*, "dimnames")=List of 2  \$ adapt    : num 1000  \$ y        : num [1:500] 0 1 1 1 1 1 1 1 1 1 ...  \$ x        : num [1:500, 1:2] 1 1 1 1 1 1 1 1 1 1 ...   ..- attr(*, "dimnames")=List of 2  parent: proto object # Use the Logistic summary function to tabulate and plot resultslr\$summary()From 5000 samples after 1000 iterations burning in   intercept           X          Min.   :1.420   Min.   :-5.296   1st Qu.:1.840   1st Qu.:-3.915   Median :2.000   Median :-3.668   Mean   :1.994   Mean   :-3.693   3rd Qu.:2.128   3rd Qu.:-3.455   Max.   :2.744   Max.   :-2.437  ``

And here’s the code for the Logistic trait…

``Logistic <- proto()Logistic\$new <- function(., x, y) {  # Creates a child object to hold data and results  #  # x - a design matrix (ie. predictors  # y - a binary reponse vector  proto(., x=x, y=y)}Logistic\$run <- function(., niter, adapt=1000) {  # Perform the regression by running the MCMC  # sampler  #  # niter - number of iterations to sample  # adapt - number of prior iterations to run  #         for the 'burning in' period  require(mvtnorm)  # Set up variables used by the sampler  .\$adapt <- adapt  total.iter <- niter + adapt    .\$beta <- matrix(0, nrow=total.iter, ncol=ncol(.\$x))  .\$prior.mu <- rep(0, ncol(.\$x))  .\$prior.cov <- diag(100, ncol(.\$x))  .\$cov <- diag(ncol(.\$x))  # Run the sampler  b <- rep(0, ncol(.\$x))  for (i in 1:total.iter) {    b <- .\$update(i, b)    .\$beta[i,] <- b  }    # Trim the results matrix to remove the burn-in  # period  if (.\$adapt > 0) {    .\$beta <- .\$beta[(.\$adapt + 1):total.iter,]  }}Logistic\$update <- function(., it, beta.old) {  # Perform a single iteration of the MCMC sampler using  # Metropolis-Hastings algorithm.  # Adapted from code by Brian Neelon published at:  # http://www.duke.edu/~neelo003/r/  #  # it -       iteration number  # beta.old - vector of coefficient values from   #            the previous iteration  # Update the coefficient covariance if we are far  # enough through the sampling  if (.\$adapt > 0 & it > 2 * .\$adapt) {    .\$cov <- cov(.\$beta[(it - .\$adapt):(it - 1),])  }    # generate proposed new coefficient values  beta.new <- c(beta.old + rmvnorm(1, sigma=.\$cov))    # calculate prior and current probabilities and log-likelihood  if (it == 1) {    .\$..log.prior.old <- dmvnorm(beta.old, .\$prior.mu, .\$prior.cov, log=TRUE)    .\$..probs.old <- plogis(.\$x %*% beta.old)    .\$..LL.old <- sum(log(ifelse(.\$y, .\$..probs.old, 1 - .\$..probs.old)))  }  log.prior.new <- dmvnorm(beta.new, .\$prior.mu, .\$prior.cov, log=TRUE)  probs.new <- plogis(.\$x %*% beta.new)    LL.new <- sum(log(ifelse(.\$y, probs.new, 1-probs.new)))    # Metropolis-Hastings acceptance ratio (log scale)  ratio <- LL.new + log.prior.new - .\$..LL.old - .\$..log.prior.old    if (log(runif(1)) < ratio) {   .\$..log.prior.old <- log.prior.new   .\$..probs.old <- probs.new   .\$..LL.old <- LL.new    return(beta.new)  } else {    return(beta.old)  }}Logistic\$summary <- function(., show.plot=TRUE) {  # Summarize the results  cat("From", nrow(.\$beta), "samples after", .\$adapt, "iterations burning in\n")  base::print(base::summary(.\$beta))    if (show.plot) {    par(mfrow=c(1, ncol(.\$beta)))    for (i in 1:ncol(.\$beta)) {      plot(density(.\$beta[,i]), main=colnames(.\$beta)[i])    }  }}``

Now that’s probably not the greatest design in the world, but it only took me a few minutes to put it together and it’s incredibly easy to modify or extend. Try it !

Thanks to Brian Neelon for making his MCMC logistic regression code available (http://www.duke.edu/~neelo003/r/).

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

## Recent popular posts

Contact us if you wish to help support R-bloggers, and place your banner here.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)