Closures in R: A useful abstraction

[This article was first published on Left Censored » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

People who have been using R for any length of time have probably become accustomed to passing functions as arguments to other functions. From my experience, however, people are much less likely to return functions from their own custom code. This is too bad because doing so can open up a whole new world of abstraction that can greatly decrease the quantity and complexity of the code necessary to complete certain types of tasks. Here I provide some brief examples of how R programmers can utilize lexical closures to encapsulate both data and methods.

To begin with a simple example, suppose you want a function that adds 2 to its argument. You would likely write something like this:

add_2 <- function(y) { 2 + y }

Which does exactly what you would expect:

> add_2(1:10)
 [1] 3 4 5 6 7 8 9 10 11 12

Now suppose you need another function that instead adds 7 to its argument. The natural thing to do would be to write another function, just like add_2, where the 2 is replaced with a 7. But this would be grossly inefficient: if in the future you discover that you made a mistake and you in fact need to multiply the values instead of add them, you would be forced to change the code in two places. In this trivial example, that may not be much trouble, but for more complicated projects, duplicating code is a recipe for disaster.

A better idea would be to write a function that takes one argument, x, that returns another function which adds its argument, y, to x. In other words, something like this:

add_x <- function(x) {
    function(y) { x + y }
}

Now, when you call add_x with an argument, you will get back a function that does exactly what you want:

add_2 <- add_x(2)
add_7 <- add_x(7)

> add_2(1:10)
 [1] 3 4 5 6 7 8 9 10 11 12
> add_7(1:10)
 [1] 8 9 10 11 12 13 14 15 16 17

So far, this doesn’t appear too earth-shattering. But if you look closely at the definition of add_x, you may notice something odd: how does the return function know where to find x when it’s called at a later point?

It turns out that R is lexically scoped, meaning that functions carry with them a reference to the environment within which they were defined. In this case, when you call add_x, the x argument you provide gets attached to the environment for the return function. In other words, in this simple example, you can think of R as just replacing all instances of the x variable in the function to be returned with the value you specify when you called add_x.

Ok, so this may be a neat trick, but how this can be used more productively? For a slightly more complicated example, suppose you are performing some complex bootstrapping and, for efficiency, you pre-allocate container vectors to store the results. This is straightforward when you have just a single vector of results—all you have to do is remember to iterate an index counter each time you add a result to the vector.

for (i in 1:nboot) {
  bootmeans[i] <- mean(sample(data, length(data), replace=TRUE))
}

> mean(data)
 [1] 0.0196
> mean(bootmeans)
 [1] 0.0188

But suppose you want to track several different statistics, each requiring you to keep track of a different index variable. If your bootstrapping routine is even a little bit complicated, this can be tedious and prone to error. By using closures, you can abstract away all of this bookkeeping. Here is a constructor function that wraps a pre-allocated container vector:

make_container <- function(n) {
    x <- numeric(n)
    i <- 1

    function(value=NULL) {
        if (is.null(value)) {
            return(x)
        }
        else {
            x[i] <<- value
            i <<- i + 1
        } 
    }
}

When you call make_container with an argument, it pre-allocates a numeric vector of the specified length, n, and returns a function that allows you to add data to that vector without having to worry about keeping track of an index. If you don’t the argument to that return function is NULL, the full vector is returned.

bootmeans <- make_container(nboot)

for (i in 1:nboot)
 bootmeans(mean(sample(data, length(data), replace=TRUE)))

> mean(data)
 [1] 0.0196
> mean(bootmeans())
 [1] 0.0207

Here make_container is relatively simple, but it can be as complicated as you need. For instance, you may want to have the constructor function perform some expensive calculations that you would rather not do every time the function is called. In fact, this is what I have done in the boolean3 package to minimize the number of calculations done at every iteration of the optimization routine.

To leave a comment for the author, please follow the link and comment on their blog: Left Censored » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)