The performance cost of a for-loop, and some alternatives

[This article was first published on Left Censored » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve recently been spending a lot of time running various simulations in R. Because I often use snow to perform simulations across several computers/cores, results typically come back in the form of a list object. Summarizing the results from a list is simple enough using a for-loop, but it’s much “sexier” to use a functional style of programming that takes advantage of higher order functions or the *apply-family of functions (R is, after all, a functional language at its core). But what’s the performance cost of using, for instance, lapply with a large list, particularly when the results from lapply have to be further manipulated to get them into a useful form? I recently did some performance testing on three of the *apply functions and compared them to an equivalent for-loop approach. The results follow.

First, I set up a large list with some fake data. Here I make a list, Sims, containing 30000 \(5 \times 5\) matrices. (This is a data structure similar to one I recently found myself analyzing.)

set.seed(12345)
S <- 30000
Sims <- Map(function(x) matrix(rnorm(25), nrow=5, ncol=5), 1:S)

Now I define four functions that extract the fifth column of each matrix in Sims and create a \(30000 \times 5 \) matrix of results.

for_method <- function(S, Sims) {
  R <- matrix(NA, ncol=5, nrow=S)
  for (i in 1:S) R[i,] <- Sims[[i]][,5]
  R
}

sapply_method <- function(Sims) {
  t(sapply(Sims, function(x) x[,5]))
}

lapply_method <- function(Sims) {
  do.call(rbind, lapply(Sims, function(x) x[,5]))
}

vapply_method <- function(Sims) {
  t(vapply(Sims, function(x) x[,5], rep(0, 5)))
}

Notice that the for_method first creates a container matrix and inserts the results directly into it. Using something like rbind would force R to copy the matrix in each for-iteration, which is very inefficient. Also notice that in each of the apply methods, the result has to be manipulated before it is returned. It’s often possible to eliminate this step during the initial simulation process. But if you’re like me, you sometimes start your simulations before you realize what you intend on doing with the results.

Now for some timing results:

> system.time(replicate(20, R.1 <<- for_method(S, Sims)))
   user  system elapsed
  4.340   0.072   4.414
> system.time(replicate(20, R.2 <<- sapply_method(Sims)))
   user  system elapsed
  3.452   0.076   3.528
> system.time(replicate(20, R.3 <<- lapply_method(Sims)))
   user  system elapsed
  4.393   0.044   4.433
> system.time(replicate(20, R.4 <<- vapply_method(Sims)))
   user  system elapsed
  2.588   0.040   2.628
> all(identical(R.1, R.2), identical(R.1, R.3), identical(R.1, R.4))
[1] TRUE

These results surprised me—I never expected vapply to be so much quicker than the other three methods. This may have to do with that fact that, since vapply requires you to explicitly specify the resultant data structure, R is able allocate the memory ahead of time.

As a simple extension, I thought I would retest taking advantage of byte-compiled code:

library(compiler)

for_method <- cmpfun(for_method)
sapply_method <- cmpfun(sapply_method)
lapply_method <- cmpfun(lapply_method)
vapply_method <- cmpfun(vapply_method)

And here are the results:

> system.time(replicate(10, R.1 <<- for_method(S, Sims)))
   user  system elapsed
  2.084   0.024   2.110
> system.time(replicate(10, R.2 <<- sapply_method(Sims)))
   user  system elapsed
  1.624   0.016   1.642
> system.time(replicate(10, R.3 <<- lapply_method(Sims)))
   user  system elapsed
  2.152   0.012   2.164
> system.time(replicate(10, R.4 <<- vapply_method(Sims)))
   user  system elapsed
  1.204   0.004   1.207
> all(identical(R.1, R.2), identical(R.1, R.3), identical(R.1, R.4))
[1] TRUE

These results are also surprising. I really thought byte-compiling the functions would make the results much closer (doesn’t each function compile into what is effectively a for-loop?). However, in this simple example at least, that isn’t the case. vapply still has, by a significant margin, the best performance. sapply also performs noticeably better than lapply and the for-loop, which is an important result since sapply doesn’t require a pre-specified data structure. Assuming they hold in the presence of more complicated data structures (and I don’t see what they wouldn’t), these results seem to suggest that a functional approach can provide benefits in terms of coding efficiency and run-time performance.

I welcome any feedback or suggestions on how these tests could be improved or extended.

Update

A couple things have come up in the comments. First, Henrik provided a more elegant (and efficient) version of vapply_method:

vapply_method2 <- function(Sims) {
  t(vapply(Sims, "[", rep(0, 5), i = 1:5, j = 5))
}

Second, several people have reported that the for_method on their system is just as fast, if not faster, than the vapply_method when compiled. I have retested my results, adding Henrik's vapply_method2. Here are the results for compiled code (note that I have increased the number of replications):

> library(compiler)
> set.seed(12345)
> S <- 30000
> Sims <- Map(function(x) matrix(rnorm(25), nrow=5, ncol=5), 1:S)
> 
> for_method <- function(S, Sims) {
+   R <- matrix(NA, ncol=5, nrow=S)
+   for (i in 1:S) R[i,] <- Sims[[i]][,5]
+   R
+ }
> 
> sapply_method <- function(Sims) {
+   t(sapply(Sims, function(x) x[,5]))
+ }
> 
> lapply_method <- function(Sims) {
+   do.call(rbind, lapply(Sims, function(x) x[,5]))
+ }
> 
> vapply_method <- function(Sims) {
+   t(vapply(Sims, function(x) x[,5], rep(0, 5)))
+ }
> 
> vapply_method2 <- function(Sims) {
+   t(vapply(Sims, "[", rep(0, 5), i = 1:5, j = 5))
+ }
> for_method <- cmpfun(for_method)
> sapply_method <- cmpfun(sapply_method)
> lapply_method <- cmpfun(lapply_method)
> vapply_method <- cmpfun(vapply_method)
> vapply_method2 <- cmpfun(vapply_method2)
> system.time(replicate(50, R.1 <<- for_method(S, Sims)))
   user  system elapsed 
  6.052   0.136   6.191 
> system.time(replicate(50, R.2 <<- sapply_method(Sims)))
   user  system elapsed 
  8.361   0.116   8.474 
> system.time(replicate(50, R.3 <<- lapply_method(Sims)))
   user  system elapsed 
 10.548   0.144  10.692 
> system.time(replicate(50, R.4 <<- vapply_method(Sims)))
   user  system elapsed 
  5.813   0.124   5.936 
> system.time(replicate(50, R.5 <<- vapply_method2(Sims)))
   user  system elapsed 
  4.268   0.084   4.352

For whatever reason, the vapply methods remain the fastest on my system (particularly Henrik’s version). I also tested on another Linux machine and got the same results (I have not yet tested on Windows). Of course, this is a very simple example, so the performance differences may not hold up to the increasing complexity of the input function. I’d be interested to know if anyone has an idea as to why vapply would perform better in Linux than in Windows/Mac. A library thing?

To leave a comment for the author, please follow the link and comment on their blog: Left Censored » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)