vRoom vRoom : Speeding up R with C

June 8, 2011
By

(This article was first published on Milk Trader, and kindly contributed to R-bloggers)

Many times you don't want to trouble friends for help with menial tasks like moving furniture. But sometimes you need to step out and ask. Your friends are always happy to help, and after the heavy lifting is done you see how easy it can be. R likes to move furniture. It's okay with moving a small table across the room, but when you need to bring a large sofa up three flights of stairs, it's time to ask for help. Your best friend is C.

The basic idea behind speeding up R with a C script is to write a C script, compile it, load it into the environment (ask it over to the house, so to speak) and call it from R with a built-in R function. There are some choices here but for our test case, we're going to use the .C() function.

A couple ground rules first. The C function must be of type void (does not return nothing) and needs to accept pointer arguments. You define the argument in R as an object, and it gets passed as a pointer to the C function. R objects are basically pointers anyway so this is not terribly difficult. Let's illustrate before this becomes a lecture on pointers and objects.

Here is the code for an R function that uses an R loop to print out permutations of a double loop. It's the guts of a brute force search but only returns a string saying that it was able to locate and print each permutation. You can either copy and paste it into R or save it and source() it.


twoloop <- function(){
for(i in seq(10, 50, 1 ))
  for(j in seq(80, 240, 4))
cat("Parameter one is", i, "and Parameter two is",j,"\n" )
}



We are stepping from 10 to 50 in increments of 1 (40 events) and also from 80 to 240 in steps of 4 (40 events again). The permutation total is those two multiplied, which comes to 1600. The size of a large sofa basically. R can do it, but it takes some time because of the explicit looping. Here is the tail of the output along with performance statistics. The system.time() function is used to calculate performance.

system.time(twoloop())
.
.
.

Parameter one is 50 and Parameter two is 224 
Parameter one is 50 and Parameter two is 228 
Parameter one is 50 and Parameter two is 232 
Parameter one is 50 and Parameter two is 236 
Parameter one is 50 and Parameter two is 240 
   user  system elapsed 
  0.070   0.031   0.139  


To get some help from C, we write the following script:


#include < R.h >
#include < stdio.h >


void twoloop(int *startOne, int *stopOne, int *stepOne, int *startTwo, int *stopTwo, int *stepTwo)
{
int i,j;
for(i = *startOne; i < *stopOne+1; i = i + *stepOne)
for(j = *startTwo; j < *stopTwo+1; j = j + *stepTwo)
Rprintf("Parameter one is %d and Parameter two is %d\n", i, j);
}

Notice the pointer arguments. Also, notice we include R.h header file. Save this script as twoloop.c and then from command line, use the following instruction to compile it.

R CMD SHLIB twoloop.c

This creates a new file called twoloop.so that will be loaded into R with the following function inside of an R session.

dyn.load("twoloop.so")

Great, now we use the .C() function to call the function. But first we need to create some R objects. There are six variables so here is what we'll do.

a <- 10
b <- 50
c <- 1
p <- 180
q <- 240
r <- 4

Now we simply pass those objects in as integers. Remember, that is what the C function is expecting so let's not mess around. It is here to help us after all.

.C("twoloop", as.integer(a), as.integer(b), as.integer(c), as.integer(p), as.integer(q), as.integer(r))

Of course if you're like me, you run the function first because you're so excited to see if it actually works. It does and then you remember that you were supposed to time it. No worry here, hit the up arrow to prompt R to display the last command (the monstrosity above) and hit Ctrl-A to get to the beginning of the command. Insert system.time() like thus:

system.time(.C("twoloop", as.integer(a), as.integer(b), as.integer(c), as.integer(p), as.integer(q), as.integer(r)))

Here is the tail of the C function's output along with its performance.


Parameter one is 50 and Parameter two is 224
Parameter one is 50 and Parameter two is 228
Parameter one is 50 and Parameter two is 232
Parameter one is 50 and Parameter two is 236
Parameter one is 50 and Parameter two is 240
   user  system elapsed 
  0.002   0.002   0.017 



Nice. Time to crack a beer on the new sofa. With our best friend C.

To leave a comment for the author, please follow the link and comment on his blog: Milk Trader.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.