Site icon R-bloggers

R and MPI on Ohio Supercomputer Center’s Oakley cluster

[This article was first published on Left Censored » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few years ago, I wrote a short guide to Using R and snow on the Ohio Supercomputer Center’s Glenn cluster. Several things have changed in the world of R since then (namely, the inclusion of the parallel package into the base system) and I have moved to using the Oakley cluster, so I thought it was time to write an update to that older post.

Installing Rmpi

To use parallel on the cluster, you will need to install Rmpi. First, login to the Oakley cluster with ssh and the supplied username and password:

$ ssh osuXXXX@oakley.osc.edu

Once logged in, you will need to set up R to use Rmpi. You will also need to install snow, which is used behind the scenes by parallel for MPI jobs. You will want to install Rmpi manually, as there are a few extra parameters that need to be set. The newest version of Rmpi, as of writing, is 0.6-5. You can download it using wget:

$ wget http://cran.r-project.org/src/contrib/Rmpi_0.6-5.tar.gz

With Rmpi downloaded, you can prepare to install it. Because you have to set some custom flags, the easiest way to do this is to create a short executable script

$ nano compile_Rmpi.sh

This will open nano, a text editor. Paste the following into the editor (modifying the Rmpi version for the one you are installing):

#!/bin/sh

R CMD INSTALL 
    --configure-vars="CPPFLAGS=-I${MPICH_HOME}/include LDFLAGS=-L${MPICH_HOME}/lib" 
    --configure-args="--with-Rmpi-include=${MPICH_HOME}/include 
    --with-Rmpi-libpath=${MPICH_HOME}/lib --with-Rmpi-type=MPICH2" 
    Rmpi_0.6-5.tar.gz

Once hit ctrl-X, Y, then enter to save the file. Now make it executable:

$ chmod +x compile_Rmpi.sh

Now you will need to create a .R/Makevars file indicating which compiler to use.

$ mkdir .R
$ nano .R/Makevars

In the Makevars file, enter

CC=mpicc
SHLIB_LD=mpicc

Save and exit. Now create the .Renviron file that will direct R to create a local R library repository.

$ echo 'R_LIBS_USER="~/R/library"' > .Renviron
$ mkdir -p R/library

Now you are ready to load the R module and compile Rmpi.

$ module load R
$ ./compile_Rmpi.sh

Now that Rmpi is installed, you start R and install snow and rlecuyer as you would normally.

$ R
R> install.packages(c("snow", "rlecuyer"))

Making bash more pleasant

As an unrelated aside for those new to bash and the command line: If you want a better experience using bash (the shell that is used by default), you can create a file called .bashrc and enter the following (updating the username in the export PATH line to reflect your username) to get color highlighting and a meaningful prompt:

$ nano .bashrc


### ------------------------------------------------------------------------
### default environment variables
### ------------------------------------------------------------------------

export PATH=.:$PATH:/home/osuXXXX/bin

# UTF-8
export LANG="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
export LC_ALL="en_US.UTF-8"
export MM_CHARSET=UTF-8

### ------------------------------------------------------------------------
### aliases
### ------------------------------------------------------------------------

alias ls="ls --color=auto"
alias ll="ls -l -A --color=auto -h"

### ------------------------------------------------------------------------
### ui
### ------------------------------------------------------------------------

# username colors, green for non-root, red for root
USER_COLOR='1;31m'
if [ ${UID} -eq 0 ]; then
USER_COLOR='1;31m'
fi

# components
USER_NAME='[33[${USER_COLOR}]u[33[0m]'
DIR_NAME='[33[1m]w[33[0m]'
MACHINE='[33[1;32m]h[33[0m]'
AT_SYM='[33[0;34m]@[33[0m]'

# prompt
export PS1="${TITLEBAR}${USER_NAME}${AT_SYM}${MACHINE}:${DIR_NAME} > "

And create a .bash_profile file to make sure .bashrc is run at login:

$ nano .bash_profile

if [ -f ~/.bashrc ]; then
   . ~/.bashrc
fi

Using MPI with the Cluster

Create a directory for the test files.

$ mkdir test

Now, change to that directory and create the batch job submission script as well as the R script.

$ cd test
$ nano test.job


#PBS -l walltime=00:01:00
#PBS -l nodes=1:ppn=12
#PBS -N test
#PBS -S /bin/bash
#PBS -j oe
#PBS -m abe
#PBS -M morgan.746@osu.edu
#PBS -A PAA0014

set echo

export TEST=${HOME}/test
pbsdcp -r ${TEST}/test.R $TMPDIR
cd $TMPDIR

module load R
mpiexec ~/R/library/snow/RMPISNOW < test.R

pbsdcp -g -r result.RData ${TEST}/
exit

Now create the R script, test.R:

$ nano test.R


library(Rmpi)
library(parallel)

fn <- function(n) {
  sample(1:10, n, replace=TRUE)
}

cl <- makeCluster(type="MPI")
clusterSetupRNG(cl)
  
result <- parLapply(cl, 1:12, fn)
  
save(result, file="result.RData")

Submit your job using qsub:

$ qsub test.job
3542961.oak-batch.osc.edu

The prefix, 3542961, is a unique job number and will be different each time you submit a job. To check the status of the job, as well as any others you may have submitted, use qstat -u:

$ qstat -u

oak-batch.osc.edu:15001: 
                                                                       Req'd  Req'd   Elap
Job ID               Username    Queue    Jobname  SessID NDS   TSK    Memory Time  S Time
-------------------- ----------- -------- -------- ------ ----- ------ ------ ----- - -----
3542961.oak-batc     osu6738     serial   test      13555     1     12   48gb 00:01 R   -- 

When the job has run, the result.RData file will be in the test directory. Open
it with R to verify that everything worked.

$ cd ~/test
$ module load R
$ R
R> load("result.RData")
R> result
[[1]]
[1] 2
  
[[2]]
[1]  8 10
  
[[3]]
[1]  8 10 10
  
[[4]]
[1] 1 7 3 9
  
[[5]]
[1] 10  4  8  2  1
  
[[6]]
[1] 4 2 7 3 1 5
  
[[7]]
[1] 5 5 1 4 6 1 3
  
[[8]]
[1] 10  3  7  8  3  2  7  2
  
[[9]]
[1]  9  7  6  9  1 10  9  8  8
  
[[10]]
[1]  2  1  7  8  7  1  9 10  9  9
  
[[11]]
[1]  3  4  3  1  2  2 10  6  9  9  2
  
[[12]]
[1] 10  2  5  4  1  9  7  9 10  7  5  8

If anything went wrong, details will be contained in the output file, test.o3542961. The file name will change for each job. You can view the file using less.

$ less test.o3542961

Press q to exit. Much more detail on how to create job scripts and submit them
can be found on the OSC site: OSC batch processing.

To leave a comment for the author, please follow the link and comment on their blog: Left Censored » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.