Using R and snow on Ohio Supercomputer Center’s Glenn cluster

[This article was first published on Left Censored » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over the last several days, I have had the “pleasure” of getting parallel processing with R running on the the Ohio Supercomputer Center’s (OSC) Glenn cluster. I am working on a project that uses GenMatch from Sekhon’s Matching, which uses the snow library to manage parallel processing. Getting snow to run properly on single machines, or ever with a cluster of machines via ssh connections is fairly trivial. But using it on the OSC cluster turned out to be a bit more difficult. Well, difficult in relative terms. Once you know the steps to take, it’s not all that bad.  While I am still not completely sure I’ve done everything correctly, I thought I would post this short guide in hopes that it could save someone else a few days of headaches. I’ll update the post if I discover something is incorrect.

Step 1: Compile Rmpi

In order to utilize more than one node on the Glenn cluster, you need to have Rmpi installed and, importantly, linked to the appropriate MPI libraries provided by OSC. To do so, you first need to create a .R/Makevars file in your home directory that will instruct R to use mpicc instead of gcc to compile the Rmpi library.

$ mkdir ~/.R
$ nano ~/.R/Makevars

And this is what you should place in Makevars:

CC=mpicc
SHLIB_LD=mpicc

Next, you will need to swap out the mpi module, the default, and replace it with an alternative. If the R module isn’t yet loaded, you will need to do that as well.

$ module swap mpi mvapich2-1.0.2p1-gnu
$ module load R-2.8.0

If you aren’t sure which version of MPI you should load, you can use the module avail command to see what’s available. Or, better yet, you could email the excellent support staff at OSC. Note that I was not able to get Rmpi to install correctly with R-2.11.1. Since I had 2.8 working, I didn’t do much further investigation.

Now it’s time to compile and install Rmpi. Download the most recent version and place it in your working directory. You can either do that through your browser or with wget; e.g.,

$ wget http://cran.r-project.org/src/contrib/Rmpi_0.5-9.tar.gz

Just be sure to replace the Rmpi package version above with the most recent. After doing so, the following command should correctly install the package.

$ R CMD INSTALL --configure-vars="CPPFLAGS=-I${MPICH_HOME}/include 
                                LDFLAGS=-L${MPICH_HOME}/lib" \
    --configure-args="--with-Rmpi-include=${MPICH_HOME}/include
    --with-Rmpi-libpath=${MPICH_HOME}/lib --with-Rmpi-type=MPICH2" \
    Rmpi_0.5-9.tar.gz

Note that the command above should only have line breaks immediately after the \. In other words, the whole command is three lines in length, each one ended at \, which marks a continuation.

Step 2: Setting up your PBS job script

Successfully processing a job across multiple nodes with R and snow requires some small changes to your PBS script. If you aren’t yet familiar with PBS scripts, a good place to start is here and here. First, you should create a directory to hold all of the files associated with your batch job. Here I create one called Test in my home directory:

$ mkdir ~/Test

Now create a PBS script file.

$ nano SnowTest.job

And add something like this:

#PBS -l walltime=00:10:00
#PBS -l nodes=2:ppn=8
#PBS -N SnowTest
#PBS -S /bin/bash
#PBS -j oe
#PBS -m abe
#PBS -M [email protected]

set echo

export TEST=${HOME}/Test

pbsdcp -r ${TEST}/* $TMPDIR
cd $TMPDIR

module swap mpi mvapich2-1.0.2p1-gnu
module load R-2.8.0
mpiexec -n 16 RMPISNOW < SnowTest.R

pbsdcp -g -r '*' ${TEST}/
exit

This will run whatever you put in SnowTest.R across 16 cores on two nodes for 10 minutes. To make sure everything is working, put something like the following in SnowTest.R.

# Test snow on the OSC cluster.

# First, get the cluster info.
cl <- makeCluster()

# Now generate some random variables on all of the clusters. Note,
# because we haven't set a different seed for all of the processes, you
# may get back duplicates. GenMatch and rgenoud takes care of this
# for you, but other R libraries may not. See the snow documentation
# for more details.
print(clusterCall(cl, rnorm(1000)))

Now you can submit the job as you normally would:

$ qsub SnowText.job

When the results come back, if everything worked you should see a list of 16000 random numbers in your log. If you want to verify that a longer-running job is using all of the cores and nodes you requested, you can use qstat -f and compare Wall and CPU time. For example, for a job I have running right now, when I run qstat -f, I get:

$ qstat -f 1234567
Job Id: 1234567.opt-batch.osc.edu
    Job_Name = Job-20110326-1
    Job_Owner = [email protected]
    resources_used.cput = 411:42:28
    resources_used.mem = 2998464kb
    resources_used.vmem = 6312984kb
    resources_used.walltime = 26:14:35
    [... snip ...]

Above, cput / walltime is approximately 16, which indicates that I am using all of the processors I requested.

Anyway, hopefully someone find this useful. And please let me know if you see any fatal errors in the above steps.

To leave a comment for the author, please follow the link and comment on their blog: Left Censored » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)