Parallel Tempering in R with Rmpi

October 6, 2013
By

(This article was first published on Lindons Log » R, and kindly contributed to R-bloggers)

My office computer recently got a really nice upgrade and now I have 8 cores on my desktop to play with. I also at the same time received some code for a Gibbs sampler written in R from my adviser. I wanted to try a metropolis-coupled markov chain monte carlo, MC^{3}, algorithm on it to try and improve the mixing but the problem was that it was written in R and I’m used to writing parallel code in C/C++ with OpenMP or MPI. Previously I wrote about a parallel tempering algorithm with an implementation in C++ using OpenMP and so I thought I would try and code up the same sort of thing in R as a warm-up exercise before I started with the full MC^{3} algorithm. Sadly I don’t think there is any facility in R for OpenMP style parallelism. There are packages such as snow and multicore but these are very high level packages and don’t really allow one to control the finer details. There is, however, Rmpi. It is a little bit different from regular C/Fortran MPI implementations and I once had a very bad experience getting some Rmpi code to work for a project deadline, it wasn’t pretty, so I was a little reluctant to reconsider this package but if you look at the changelogs it is still being actively maintained and in the end I’m very happy with the outcome of this experiment. I tried to write the below code as generally as possible, so that it is easily adapted by myself, or others, in the future.

Target Density

First one needs to write a density one wishes to sample from

logdensity<-function(theta){
  #Distribution one wishes to sample from here.
  #It may be more convinient to pass a theta as a list
  sigma2=0.001;
  Sigma=matrix(0,2,2);
  Sigma[1,1]=sigma2;
  Sigma[2,2]=sigma2;
  density=dmvnorm(theta,c(0,0),Sigma)+dmvnorm(theta,c(-2,0.8),Sigma)+dmvnorm(theta,c(-1,1),Sigma)+dmvnorm(theta,c(1,1),Sigma)+dmvnorm(theta,c(0.5,0.5),Sigma);
  return(log(density))
}

The density I chose was a mixture of 5 well-separated bi-variate Normals. One should note that it is probably cleanest to pass all the arguments to this function as a list theta. It wasn’t really necessary in this case but if you have a posterior distribution with a number of parameters of varying dimension then it would be much nicer as a list. In a future blog post I may change the target density to be the energy distribution of a Lennard-Jones cluster.

Parallel Tempering Algorithm

This too is written as a function because Rmpi allows you to pass the function to all slaves and execute it. It was basically the easiest way of writing it for Rmpi.

temper<-function(niter,Bmin,swap.interval){
  rank=mpi.comm.rank();
  size=mpi.comm.size();
  swap=0;
  swaps.attempted=0;
  swaps.accepted=0;
  
  #Higher ranks run the higher "temperatures" (~smaller fractional powers)
  B=rep(0,size-1);
  for(r in 1:size-1){
    temp=(r-1)/(size-2);
    B[r]=Bmin^temp;
  }
  
  
  #Create a list for proposal moves
  prop=rep(0,2);
  theta=matrix(0,niter,2)
  
  for(t in 2:niter){
    
    for(c in 1:length(prop))  prop1=theta[t-1,c]+rnorm(1,0,0.1);
    
    #Calculate Log-Density at proposed and current position
    logdensity.current=logdensity(theta[t-1,])
    logdensity.prop=logdensity(prop);
    
    #Calculate log acceptance probability
    lalpha=B[rank]*(logdensity.prop-logdensity.current)
    
    if(log(runif(1))<lalpha){
      #Accept proposed move
      theta[t,]=prop;
      logdensity.current=logdensity.prop;
    }else{
      #Otherwise do not move
      theta[t,]=theta[t-1,];
    } 
    
    if(t%%swap.interval ==0){
      for(evenodd in 0:1){
        swap=0;
        logdensity.partner=0;
        if(rank%%2 == evenodd%%2){
          rank.partner=rank + 1;
          #ranks range from 1:size-1. Cannot have a partner rank == size
          if(0<rank.partner && rank.partner<size){
            #On first iteration, evens receive from above odd
            #On second iteration, odds receive from above evens
            logdensity.partner<-mpi.recv.Robj(rank.partner,rank.partner);
            lalpha = (B[rank]-B[rank.partner])*(logdensity.partner-logdensity.current);
            swaps.attempted=swaps.attempted+1;
            if(log(runif(1))<lalpha){
              swap=1;
              swaps.accepted=swaps.accepted+1;
            }
            mpi.send.Robj(swap,dest=rank.partner,tag=rank)
          }
          if(swap==1){
            thetaswap=theta[t,];
            mpi.send.Robj(thetaswap,dest=rank.partner,tag=rank)
            theta[t,]=mpi.recv.Robj(rank.partner,rank.partner)
          }
        }else{
          rank.partner=rank-1;
          #ranks range from 1:size-1. Cannot have a partner rank ==0
          if(0<rank.partner && rank.partner<size){
            #On first iteration, odds send to evens below
            #On second iteration, evens sent to odds below
            mpi.send.Robj(logdensity.current,dest=rank.partner,tag=rank);
            swap=mpi.recv.Robj(rank.partner,rank.partner);
          }
          if(swap==1){
            thetaswap=theta[t,];
            theta[t,]=mpi.recv.Robj(rank.partner,rank.partner);
            mpi.send.Robj(thetaswap,dest=rank.partner,tag=rank);
          }
        }
      }
    }
  }
  return(theta)
}

The bulk of the above code is the communication of each processor with its next nearest neighbors. Metropolis moves will be attempted every swap.interval iterations, an argument one can pass to the function. When this code block is entered, even rank processors will partner with their higher ranked odd neighbours (they have a high rank so higher temperature i.e. smaller fractional power – a more “melted down” target density). The higher odd partners will send their lower even partners the value of their density and then the lower even partners will calculate an acceptance probabilty. If the move succeeds the lower rank even processors send their higher rank odd processors a binary swap=1 telling the higher rank odd processors that a send/receive procedure will occur. The lower even rank sends the higher odd rank its parameters and then subsequently the higher odd rank sends its lower even rank its parameters. In this way a metropolis move between processors is achieved. Next, odd rank processors form partners with their higher even ranked neighbours (because we need to swap with processor rank 1, the target density). The same procedure occurs as before but swapping odd for even. More visually, first swaps are attempted between 2-3, 4-5, 6-7 etc and then swaps are attempted between 1-2, 3-4, 5-6. This is almost like a merge-sort style algorithm. One can see how the parameters could be passed from 3 down to 2 and then from 2 down to 1. The main point is that each processor attempts a swap with its nearest-neighbours, the one above and the one below, every swap.interval iterations.
With these functions defined one can now proceed to set up the mpi communicator/world.

Rmpi

First spawn some slaves.

library(Rmpi)
mpi.spawn.Rslaves(nslaves=6)

If it worked, you should see something like this:

> mpi.spawn.Rslaves(nslaves=6)
	6 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 7 is running on: cabbage 
slave1 (rank 1, comm 1) of size 7 is running on: cabbage 
slave2 (rank 2, comm 1) of size 7 is running on: cabbage 
slave3 (rank 3, comm 1) of size 7 is running on: cabbage 
slave4 (rank 4, comm 1) of size 7 is running on: cabbage 
slave5 (rank 5, comm 1) of size 7 is running on: cabbage 
slave6 (rank 6, comm 1) of size 7 is running on: cabbage 

(yes, my office computer was named cabbage, lettuce is the one next to me). One can then send the function definitions to the slave processors.

#Send to slaves the logdensity function
mpi.bcast.Robj2slave(logdensity)
#Send to slaves the temper function
mpi.bcast.Robj2slave(temper) 

If you want to make sure that the slaves have the correct function definition, one can execute the command mpi.remote.exec(temper) and this will return the function definition. That is all, now it can be run.

mcmc=mpi.remote.exec(temper(10000,0.005,3))

This returns a list object containing the mcmc draws for each slave.

Results

The end product is something that looks like this
parallel tempering mixing
Which are the draws (in black) from the target distribution. It is also useful to build up intuition for parallel tempering to look at what is happening on the other processors. The draws for all processors are shown below:
parallel tempering draws for each processor

Earl D.J. & Deem M.W. (2005). Parallel tempering: Theory, applications, and new perspectives, Physical Chemistry Chemical Physics, 7 (23) 3910. DOI:

The post Parallel Tempering in R with Rmpi appeared first on Lindons Log.

To leave a comment for the author, please follow the link and comment on his blog: Lindons Log » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.