Peeking inside R functions

February 6, 2011
By

(This article was first published on Mark M. Fredrickson, and kindly contributed to R-bloggers)

R, like all good programming languages, treats functions as first class objects. Users can create functions, pass them as arguments, and have them returned as the result of other computations. You may be familiar with passing functions as arguments if you have used the apply family of functions (i.e. apply, sapply, lapply, mapply). For example, to get get the median of the columns of a data frame:

> data(airquality)
> apply(airquality, 2, median)
  Ozone Solar.R    Wind    Temp   Month     Day 
     NA      NA     9.7    79.0     7.0    16.0 

In this example, since some of the columns have NA values, the reported medians are also NA. We can amend the above example to drop missing values and demonstrate creating our own function to pass to apply:

> apply(airquality, 2, function(column) {
+     median(column, na.rm = T)
+ })
  Ozone Solar.R    Wind    Temp   Month     Day 
   31.5   205.0     9.7    79.0     7.0    16.0 

First class functions are useful in many scenarios. We can use them like objects to hold information. Here is a contrived example that creates functions that increment by a set amount. Observe that each function gets its own value of n, which it uses when called:

> adder <- function(n) {
+     function(i) {
+         n + i
+     }
+ }
> f1 <- adder(7)
> f2 <- adder(3)
> f1(10)
[1] 17
> f2(10)
[1] 13

Another feature of R is that functions carry their source code around with them. If ever want to know what f1 does, we can just ask R to print out the source:

> f1
function (i) 
{
    n + i
}
<environment: 0xcdc600>

While the source will show us that a variable named n is used, it does not tell us anything about the value of n. We know that the value of n in the two functions is 7 and 3, respectively, but if functions are created programmatically, as say part of a loop, we might not know what these values are. Luckily, functions also expose their environments, the set of variable names and values from the surrounding scope (the adder function in the above example). While R does not print out these environments by default, we can use a simple helper function to peek inside the function scope:

> fnpeek <- function(f, name = NULL) {
+     env <- environment(f)
+     if (is.null(name)) {
+         return(ls(envir = env))
+     }
+     if (name %in% ls(envir = env)) {
+         return(get(name, env))
+     }
+     return(NULL)
+ }
> fnpeek(f1)
[1] "n"
> fnpeek(f1, "n")
[1] 7

If you do not have one already, go make a ~/.Rprofile file and stick this function in there. You will use it. I promise. I recently used it to diagnose a problem that had been bugging me for some time. The problem concerned creating a series of functions. Using the adder example above:

> adders <- lapply(1:5, adder)
> sapply(adders, function(f) {
+     f(10)
+ })
[1] 15 15 15 15 15

The output should be 11 12 13 14 15, but instead it is constantly 15. This is because in the loop that creates the adder functions, they all share a common n, which is overwritten during the loop. The lapply function is equivalent to:

> adders <- vector(mode = "list", length = 5)
> for (i in 1:5) {
+     adders[[i]] <- adder(i)
+ }
> sapply(adders, function(f) {
+     f(10)
+ })
[1] 15 15 15 15 15

In each loop, the i variable is overwritten with a new value. Since all the functions point to this single memory address, they all effectively share the same value of n in the function body. I suspect this is a consequence of R’s call by reference function calls. Usually this is not a problem, but in loops, call by value would have been the correct behavior. Luckily, the workaround to create call by value like behavior is relatively simple: save the value of n in the local environment of the outer function.

> safe.adders <- function(n) {
+     n <- n
+     function(i) {
+         n + i
+     }
+ }
> safe.adders <- lapply(1:5, safe.adders)
> sapply(safe.adders, function(f) {
+     f(10)
+ })
[1] 11 12 13 14 15

While not ideal, at least this workaround is relatively simple (especially compared to my last solution) and gets us all the benefits we would expect of first class functions.

The version of R used in this post was 2.11.1 (2010-05-31)

To leave a comment for the author, please follow the link and comment on his blog: Mark M. Fredrickson.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.