Using R — .Call(“hello”)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In an introductory post on R APIs to C code, Calling C Code ‘Hello World!’, we explored the .C() function with some ‘Hello World!’ baby steps. In this post we will make a leap forward by implementing the same functionality using the .Call() function.
Is .Call() better than .C()?
A heated but friendly conversation took place on the r-devel email forum this past March about R’s copying of arguments and the merits of .C() and .Call(). It is perhaps best to just include a highlight from this exchange. Here is Simon Urbanek responding to Hervé Pagès:
> My understanding is that most packages use the .C interface > because it's simpler to deal with and because they don't need > to pass complicated objects at the C level, just atomic vectors. > My guess is that it's probably rarely the case that the cost > of copying the arguments passed to .C is significant, but, > if that was the case, then they could always call .C() with > DUP=FALSE. However, using DUP=FALSE is dangerous (see Warning > section in the man page). > > No need to switch to .Call > I strongly disagree. I'm appalled to see that sentence here. The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate *and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)).
The important differences between the two R interfaces to C code are summarized here:
.C()
- allows you to write simple C code that knows nothing about R
- only simple data types can be passed
- all argument type conversion and checking must be done in R
- all memory allocation must be done in R
- all arguments are copied locally before being passed to the C function (memory bloat)
.Call()
- allows you to write simple R code
- allows for complex data types
- allows for a C function return value
- allows C function to allocate memory
- does not require wasteful argument copying
- requires much more knowledge of R internals
- is the recommended, modern approach for serious C programmers
To allow readers to compare for themselves how difficult or easy it is to switch from .C() to .Call() we will re-implement our three “Hello World!” examples using the .Call() interface.
Getting used to SEXP
The first thing you have to embrace when using the .Call() interface is the new way of dealing with R objects inside your C code. Excellent introductory information and example code is available here:
- Calling C code from R (Sigal Blay, 2004) *
- Calling other languages from R (R.M. Ripley, 2009) *
- R API cheat sheet (Simon Urbanek, 2012) *
In preparation for working with .Call() you will want to familiarize yourself with the location of R’s include files. The following Unix shell commands show how to find where R is installed and then look at the contents of the include directory:
$ R RHOME /usr/lib/R $ ls -1 `R RHOME`/include Rconfig.h Rdefines.h Rembedded.h R_ext R.h Rinterface.h Rinternals.h Rmath.h Rversion.h S.h
Here’s what they contain:
Rconfig.h | various configuration flags |
Rdefines.h | lots of macros of interest, includes Rinternals.h |
Rembedded.h | function declarations for embedding R in C programs |
R_ext | directory of include files for specific data types, etc. |
R.h | includes all the files found in R_ext |
Rinterface.h | provides hooks for external GUIs |
Rinternals.h | core R data structures |
Rmath.h | math constants and function declarations |
Rversion.h | version string components |
S.h | macros for S/R compatibility |
With the .Call() interface, the C function needs to be of type SEXP — a pointer to a SEXPREC or Simple EXPression RECord. We’ll get the definition of SEXP and everything else we need by including both R.h and Rdefines.h in our code. So here is the C code for our first, brain dead C function — helloA1.c:
#include <R.h> #include <Rdefines.h> #include <stdio.h> SEXP helloA1() { printf("Hello World!\n"); return(R_NilValue); }
Note that, even though we are returning R_NilValue (aka NULL), the function is declared to be of type SEXP. The function will always be of type SEXP, as will any arguments. It will be up to the C code to convert other data types into and out of SEXP. As in the previous post, you should compile this code with R CMD SHLIB helloA1.c. Here is the very simple R function we need to add to wrappers.R:
# wrapper function to invoke helloA1 dyn.load("helloA1.so") helloA1 <- function() { result <- .Call("helloA1") }
Finally, what does it look like when invoked from R?
> source('wrappers.R') > greeting <- helloA1() Hello World! > class(greeting) [1] "NULL"
Whew! That was a lot of complexity just to run “Hello World!”. However, the value of this complexity will become apparent as we move forward.
PROTECT against garbage collection
One of the things R does well is pick up the garbage we leave lying around. (If you’ve ever lived through a garbage haulers’ strike you know this is a good thing.) Unused objects are disposed of after they are no longer needed (i.e. after there are no more active references to them) to free up memory. As we write C code that uses R functions and structures we need to make sure that R knows when it should not toss something out and, after we are done, when it is again OK. This is done with the PROTECT and UNPROTECT functions.
Here is our next iteration of “Hello World!” where we will allocate space for an R character vector, assign our greeting to the first element and then return the vector:
#include <R.h> #include <Rdefines.h> SEXP helloB1() { SEXP result; PROTECT(result = NEW_CHARACTER(1)); SET_STRING_ELT(result, 0, mkChar("Hello World!")); UNPROTECT(1); return(result); }
Note that we allocate memory for a character vector of length # with NEW_CHARACTER(#). It is worth taking a look in the R include files to see how this and similar macros are defined:
$ grep NEW_ /usr/lib/R/include/*.h /usr/lib/R/include/Rdefines.h:#define NEW_LOGICAL(n) allocVector(LGLSXP,n) /usr/lib/R/include/Rdefines.h:#define NEW_INTEGER(n) allocVector(INTSXP,n) /usr/lib/R/include/Rdefines.h:#define NEW_NUMERIC(n) allocVector(REALSXP,n) /usr/lib/R/include/Rdefines.h:#define NEW_CHARACTER(n) allocVector(STRSXP,n) /usr/lib/R/include/Rdefines.h:#define NEW_COMPLEX(n) allocVector(CPLXSXP,n) /usr/lib/R/include/Rdefines.h:#define NEW_LIST(n) allocVector(VECSXP,n) /usr/lib/R/include/Rdefines.h:#define NEW_STRING(n) NEW_CHARACTER(n) /usr/lib/R/include/Rdefines.h:#define NEW_RAW(n) allocVector(RAWSXP,n) /usr/lib/R/include/Rdefines.h:/* NEW_OBJECT is recommended; NEW is for green book compatibility */ /usr/lib/R/include/Rdefines.h:#define NEW_OBJECT(class_def) R_do_new_object(class_def)
So we could have used allocVector(STRSXP,1) instead of NEW_CHARACTER(1) and you will see plenty of the former in R source code and packages. Similarly you can grep for “_ELT” or “mkChar” and learn about those. There really isn’t any definitive source for information and you will have to get comfortable googling, poking around source code examples, examining the R include files and even checking the R-devel mailing list to get a sense of the R functions that are available for getting C code to work with R objects. I would recommend spending some time with Rinternals.h and Rdefines.h.
After R CMD SHLIB‘ing we will again create a very simple wrapper and then run the code from R:
# wrapper function to return a greeting. dyn.load("helloB1.so") helloB1 <- function() { result <- .Call("helloB1") return(result) }
> source('wrappers.R') > greeting <- helloB1() > class(greeting) [1] "character" > greeting [1] "Hello World!"
Double Whew! So far it still seems like .Call() is a big headache. But we haven’t really tried to do anything in our C code yet. The complexity/benefit balance evens out a little in our final example.
Casting about in the R header files
The title of this section really says it all. As you start to do more in your C code you will need to learn how to cast character strings into SEXP objects, SEXP objects into integers, etc. etc. There is a finite, but large, amount to know before you become expert. The two links in the “Getting used to SEXP” section above have excellent examples as does Programming with Data: Using and Extending R by Dirk Eddelbuettel.
Here is our last “Hello World!” example, the one that counts the characters in incoming greetings. This example shows how R macros defined in Rdefines.h are used to extract elements from a vector, how vector elements are cast into char and int and how you need to UNPROTECT the same number of elements that you placed on the PROTECT stack.
#include <R.h> #include <Rdefines.h> #include <string.h> SEXP helloC1(SEXP greeting) { int i, vectorLength, stringLength; SEXP result; PROTECT(greeting = AS_CHARACTER(greeting)); vectorLength = LENGTH(greeting); PROTECT(result = NEW_INTEGER(vectorLength)); for (i=0; i<vectorLength; i++) { stringLength = strlen(CHAR(STRING_ELT(greeting, i))); INTEGER(result)[i] = stringLength; } UNPROTECT(2); return(result); }
After R CMD SHLIB, here is the wrapper and the R session:
# wrapper function to invoke helloC1 dyn.load("helloC1.so") helloC1 <- function(greeting) { result <- .Call("helloC1", greeting) return(result) }
> source('wrappers.R') > greeting <- c("Hello World!", "Bonjour tout le monde!", "Привет мир!") > helloC1(greeting) [1] 12 22 20
Yes, it’s still at the double Whew! level but we did some worthwhile things like allocate space for R objects and correctly harness garbage collection. If there were any halfway decent API docs for all this I would have no hesitation in recommending the .Call() interface to anyone writing C code. As it is, however, there will be a painful learning curve. If all you are doing is processing a vector of numbers and returning a simple scalar or vector result then the .C() interface will certainly be much easier — assuming you can take the memory hit. If, on the other hand, you are doing things like using a C library to convert a bunch of raw data into more complex structures then you are going to have to learn to do things the R way.
But there is hope! In the next post we will investigate using the Rcpp package to simplify this robust but complex interface to C code. Hopefully we won’t have to become C++ wizards to do so.
Example Packages using .Call()
The .Call() interface is heavily used in many R packages. Along with poring over Writing R Extensions document it is important to have some example code to work from. Here is a running list of the packages I found with useful example code:
- Rcsdp — R interface to the CSDP semidefinite programming library.
More Information
Hadley Wickham has written an excellent tutorial on using the .Call() interface.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.