[This article was first published on Stack Exchange Stats Blog » R tips&tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What is OpenMP? Basically a standard compiler extension allowing one to easily distribute calculations over multiple processors in a shared-memory manner (this is especially important when dealing with large data — simple separate-process approach usually requires as many copies of the working data as there are threads, and this may easily be an overkill even in overall size, not to mention the time wasted for copying).

The magic of OpenMP is that once you have a C or Fortran code, in most cases you need nothing more than a few additional compiler flags — thus the code remains as portable and as readable as before the modification. And is usually just nice and simple, not counting few common parallelism traps and some quirks related to the fact we want it to work with R.

In this post I don’t want to make an OMP tutorial (the web is full of them), rather show how to use it with R. Thus, I’ll use a toy example: a function that calculates the cumulative sum in an unnecessary demanding way:

 #include
#include
SEXP dumbCumsum(SEXP a){
SEXP ans;
PROTECT(a=coerceVector(a,REALSXP));
PROTECT(ans=allocVector(REALSXP,length(a)));
double* Ans=REAL(ans);
double* A=REAL(a);
for(int e=0;e

There is only one for loop responsible for most computational time and no race conditions, thus the OMP-ized version will look like this:

#include
#include
#include
SEXP dumbCumsum(SEXP a){
SEXP ans;
PROTECT(a=coerceVector(a,REALSXP));
PROTECT(ans=allocVector(REALSXP,length(a)));
double* Ans=REAL(ans);
double* A=REAL(a);
#pragma omp parallel for
for(int e=0;e

Time for R-specific improvements; first of all, it is good to give the user an option to select number of cores to use (for instance he has 16 cores and want to use first 8 for one job and next 8 for something else — without such option he would have to stick to sequential execution); yet it is also nice to have some simple option to use the full capabilities of the system. To this end we will give our function an appropriate argument and use OMP functions to comply with it:

#include
#include
#include
SEXP dumbCumsum(SEXP a,SEXP reqCores){
PROTECT(reqCores=coerceVector(reqCores,INTSXP));
int useCores=INTEGER(reqCores)[0];
int haveCores=omp_get_num_procs();
if(useCores<=0 || useCores>haveCores) useCores=haveCores;
//Do the job
SEXP ans;
PROTECT(a=coerceVector(a,REALSXP));
PROTECT(ans=allocVector(REALSXP,length(a)));
double* Ans=REAL(ans);
double* A=REAL(a);
#pragma omp parallel for
for(int e=0;e

This code will also ensure that the number of threads won’t be larger than the number of physical cores; doing this gives no speedup and comes with a performance loss caused by OS scheduler.

Finally, time to resolve small quirk — R has some code to guard the C call stack from overflows, which is obviously not thread-aware and thus have a tendency to panic and screw the whole R session up when running parallel code. To this end we need to disable it using the trick featured in R-ext. First, we include Rinterface to have an access to the variable with stack limit

#define CSTACK_DEFNS 7
#include "Rinterface.h"

and then set it to almost infinity in the code

R_CStackLimit=(uintptr_t)-1;

Voilà, the stack is now unprotected — the work with R just become a bit more dangerous, but we can run parallel stuff without strange problems. The full code looks like this:

#include
#include
#include
#define CSTACK_DEFNS 7
#include "Rinterface.h"
SEXP dumbCumsum(SEXP a,SEXP reqCores){
R_CStackLimit=(uintptr_t)-1;
PROTECT(reqCores=coerceVector(reqCores,INTSXP));
int useCores=INTEGER(reqCores)[0];
int haveCores=omp_get_num_procs();
if(useCores<=0 || useCores>haveCores) useCores=haveCores;
//Do the job
SEXP ans;
PROTECT(a=coerceVector(a,REALSXP));
PROTECT(ans=allocVector(REALSXP,length(a)));
double* Ans=REAL(ans);
double* A=REAL(a);
#pragma omp parallel for
for(int e=0;e

Now, time to make sure that R will compile our function with OMP support (and thus make it parallel). To this end, we create a Makevars file (in the src in case of package and in code directory when using dangling object files) with a following contents (for GCC):

PKG_CFLAGS=-fopenmp
PKG_LIBS=-lgomp

The first line will trigger parsing OMP pragmas, the latter will link the OMP library with omp_* functions.

We are ready to test our example:

$R CMD SHLIB omp_sample.c$ R
> .Call('dumbCumsum',runif(100000),0L)

Try to run sum system monitor (like htop or GUI one that comes with your desktop environment) and watch your powerful CPU being finally fully utilized (-;

To close with an optimistic aspect, few words about limitations. Don’t even try to run any R code or use features like random number generation or Ralloc inside parallelized blocks — R engine is not thread-safe and thus this will end in a more or less spectacular failure.

Plus of course all issues of parallel programming and OMP itself also apply — but that is a different story.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'};
(function(d, t) {
var s = d.createElement(t);
s.type = 'text/javascript';
s.async = true;
// s.defer = true;
var r = d.getElementsByTagName(t)[0];
r.parentNode.insertBefore(s, r);
}(document, 'script'));

Related
ShareTweet