Computation time of loops — for, *apply, map

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It is usually said, that for– and while-loops should be avoided in R. I was curious about just how the different alternatives compare in terms of speed.

The first loop is perhaps the worst I can think of – the return vector is initialized without type and length so that the memory is constantly being allocated.
use_for_loop <- function(x){
  y <- c()
  
  for(i in x){
    y <- c(y, x[i] * 100)
  }
  return(y)
}
The second for loop is with preallocated size of the return vector.
use_for_loop_vector <- function(x){
  y <- vector(mode = "double", length = length(x))
  
  for(i in x){
    y[i] <- x[i] * 100
  }
  return(y)
}
I have noticed I use sapply() quite a lot, but I think not once have I used vapply() We will nonetheless look at both
use_sapply <- function(x){
  sapply(x, function(y){y * 100})
}

use_vapply <- function(x){
  vapply(x, function(y){y * 100}, double(1L))
}
And because I am a tidyverse-fanboy we also loop at map_dbl().
library(purrr)
use_map_dbl <- function(x){
  map_dbl(x, function(y){y * 100})
}
We test the functions using a vector of random doubles and evaluate the runtime with microbenchmark.
x <- c(rnorm(100))
mb_res <- microbenchmark::microbenchmark(
  `for_loop()` = use_for_loop(x),
  `for_loop_vector()` = use_for_loop_vector(x),
  `purrr::map_dbl()` = use_map_dbl(x),
  `sapply()` = use_sapply(x),
  `vapply()` = use_vapply(x),
  times = 500
)
The results are listed in table and figure below.
expr min lq mean median uq max neval
for_loop() 8.440 9.7305 10.736446 10.2995 10.9840 26.976 500
for_loop_vector() 10.912 12.1355 13.468312 12.7620 13.8455 37.432 500
purrr::map_dbl() 22.558 24.3740 25.537080 25.0995 25.6850 71.550 500
sapply() 15.966 17.3490 18.483216 18.1820 18.8070 59.289 500
vapply() 6.793 8.1455 8.592576 8.5325 8.8300 26.653 500

  The clear winner is vapply() and for-loops are rather slow. However, if we have a very low number of iterations, even the worst for-loop isn’t too bad:
x <- c(rnorm(10))
mb_res <- microbenchmark::microbenchmark(
  `for_loop()` = use_for_loop(x),
  `for_loop_vector()` = use_for_loop_vector(x),
  `purrr::map_dbl()` = use_map_dbl(x),
  `sapply()` = use_sapply(x),
  `vapply()` = use_vapply(x),
  times = 500
)
expr min lq mean median uq max neval
for_loop() 5.992 7.1185 9.670106 7.9015 9.3275 70.955 500
for_loop_vector() 5.743 7.0160 9.398098 7.9575 9.2470 40.899 500
purrr::map_dbl() 22.020 24.1540 30.565362 25.1865 27.5780 157.452 500
sapply() 15.456 17.4010 22.507534 18.3820 20.6400 203.635 500
vapply() 6.966 8.1610 10.127994 8.6125 9.7745 66.973 500

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)