Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The proto package is my latest favourite R goodie. It brings prototype-based programming to the R language – a style of programming that lets you do many of the things you can do with classes, but with a lot less up-front work. Louis Kates and Thomas Petzoldt provide an excellent introduction to using proto in the package vignette.

As a learning exercise I concocted the example below involving Bayesian logistic regression. It was inspired by an article on Matt Shotwell’s blog about using R environments rather than lists to store the state of a Markov Chain Monte Carlo sampler. Here I use proto to create a parent class-like object (or trait in proto-ese) to contain the regression functions and create child objects to hold both data and results for individual analyses.

First here’s an example session…

``# Make up some data with a continuous predictor and binary responsenrec <- 500x <- rnorm(nrec)y <- rbinom(nrec, 1, plogis(2 - 4*x))# Predictor matrix with a col of 1s for interceptpred <- matrix(c(rep(1, nrec), x), ncol=2)colnames(pred) <- c("intercept", "X")# Load the proto packagelibrary(proto)# Use the Logistic parent object to create a child object which will # hold the data and run the regression (the \$ operator references # functions and data within a proto object)lr <- Logistic\$new(pred, y)lr\$run(5000, 1000)# lr now contains both data and resultsstr(lr)proto object  \$ cov      : num [1:2, 1:2] 0.05 -0.0667 -0.0667 0.1621   ..- attr(*, "dimnames")=List of 2  \$ prior.cov: num [1:2, 1:2] 100 0 0 100  \$ prior.mu : num [1:2] 0 0  \$ beta     : num [1:5000, 1:2] 2.09 2.09 2.09 2.21 2.21 ...   ..- attr(*, "dimnames")=List of 2  \$ adapt    : num 1000  \$ y        : num [1:500] 0 1 1 1 1 1 1 1 1 1 ...  \$ x        : num [1:500, 1:2] 1 1 1 1 1 1 1 1 1 1 ...   ..- attr(*, "dimnames")=List of 2  parent: proto object # Use the Logistic summary function to tabulate and plot resultslr\$summary()From 5000 samples after 1000 iterations burning in   intercept           X          Min.   :1.420   Min.   :-5.296   1st Qu.:1.840   1st Qu.:-3.915   Median :2.000   Median :-3.668   Mean   :1.994   Mean   :-3.693   3rd Qu.:2.128   3rd Qu.:-3.455   Max.   :2.744   Max.   :-2.437  `` And here’s the code for the Logistic trait…

``Logistic <- proto()Logistic\$new <- function(., x, y) {  # Creates a child object to hold data and results  #  # x - a design matrix (ie. predictors  # y - a binary reponse vector  proto(., x=x, y=y)}Logistic\$run <- function(., niter, adapt=1000) {  # Perform the regression by running the MCMC  # sampler  #  # niter - number of iterations to sample  # adapt - number of prior iterations to run  #         for the 'burning in' period  require(mvtnorm)  # Set up variables used by the sampler  .\$adapt <- adapt  total.iter <- niter + adapt    .\$beta <- matrix(0, nrow=total.iter, ncol=ncol(.\$x))  .\$prior.mu <- rep(0, ncol(.\$x))  .\$prior.cov <- diag(100, ncol(.\$x))  .\$cov <- diag(ncol(.\$x))  # Run the sampler  b <- rep(0, ncol(.\$x))  for (i in 1:total.iter) {    b <- .\$update(i, b)    .\$beta[i,] <- b  }    # Trim the results matrix to remove the burn-in  # period  if (.\$adapt > 0) {    .\$beta <- .\$beta[(.\$adapt + 1):total.iter,]  }}Logistic\$update <- function(., it, beta.old) {  # Perform a single iteration of the MCMC sampler using  # Metropolis-Hastings algorithm.  # Adapted from code by Brian Neelon published at:  # http://www.duke.edu/~neelo003/r/  #  # it -       iteration number  # beta.old - vector of coefficient values from   #            the previous iteration  # Update the coefficient covariance if we are far  # enough through the sampling  if (.\$adapt > 0 & it > 2 * .\$adapt) {    .\$cov <- cov(.\$beta[(it - .\$adapt):(it - 1),])  }    # generate proposed new coefficient values  beta.new <- c(beta.old + rmvnorm(1, sigma=.\$cov))    # calculate prior and current probabilities and log-likelihood  if (it == 1) {    .\$..log.prior.old <- dmvnorm(beta.old, .\$prior.mu, .\$prior.cov, log=TRUE)    .\$..probs.old <- plogis(.\$x %*% beta.old)    .\$..LL.old <- sum(log(ifelse(.\$y, .\$..probs.old, 1 - .\$..probs.old)))  }  log.prior.new <- dmvnorm(beta.new, .\$prior.mu, .\$prior.cov, log=TRUE)  probs.new <- plogis(.\$x %*% beta.new)    LL.new <- sum(log(ifelse(.\$y, probs.new, 1-probs.new)))    # Metropolis-Hastings acceptance ratio (log scale)  ratio <- LL.new + log.prior.new - .\$..LL.old - .\$..log.prior.old    if (log(runif(1)) < ratio) {   .\$..log.prior.old <- log.prior.new   .\$..probs.old <- probs.new   .\$..LL.old <- LL.new    return(beta.new)  } else {    return(beta.old)  }}Logistic\$summary <- function(., show.plot=TRUE) {  # Summarize the results  cat("From", nrow(.\$beta), "samples after", .\$adapt, "iterations burning in\n")  base::print(base::summary(.\$beta))    if (show.plot) {    par(mfrow=c(1, ncol(.\$beta)))    for (i in 1:ncol(.\$beta)) {      plot(density(.\$beta[,i]), main=colnames(.\$beta)[i])    }  }}``

Now that’s probably not the greatest design in the world, but it only took me a few minutes to put it together and it’s incredibly easy to modify or extend. Try it !

Thanks to Brian Neelon for making his MCMC logistic regression code available (http://www.duke.edu/~neelo003/r/).