How to Avoid For Loop in R

[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A FOR loop is the most intuitive way to apply an operation to a series by looping through each item one by one, which makes perfect sense logically but should be avoided by useRs given the low efficiency. In R, there are two ways to implement the same functionality of a FOR loop. The first option is the lapply() or sapply() function that applies a function to each item in the list, which is very similar to the Map() function that I showed in https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting and https://statcompute.wordpress.com/2018/09/03/playing-map-and-reduce-in-r-by-group-calculation. The second option is to “vectorize” a function by using the Vectorize() function such that the newly vectorized function can consume the list directly.

Below is a quick demonstration showing how to recode a FOR loop by using lapply() and Vectorize() functions. We first created a dummy loop that iterates 3 times and then prints out itself.

for (i in 1:3) {print(paste("iter", i))}
#[1] "iter 1"
#[1] "iter 2"
#[1] "iter 3"

To migrate the above FOR loop, we just need to wrap the operation “print(paste(“iter”, i))” into an anonymous function and then to apply this anonymous function to each element in the series by using the lapply() function. Please note that the invisible() function used below doesn’t do anything material but suppress printing out the object value.

invisible(lapply(1:3, function(i) print(paste("iter", i))))
#[1] "iter 1"
#[1] "iter 2"
#[1] "iter 3"

The vectorization is a little tricky. It is noted that the anonymous function created above can only be applied to each item in the series. In order to have the anonymous function consuming the whole series instead of the single item, we should create a so-called vectorized function by using the Vectorize() function and then apply this newly created function to the series directly, as shown below.

invisible(Vectorize(function(i) print(paste("iter", i)), SIMPLIFY = F) (1:3))
#[1] "iter 1"
#[1] "iter 2"
#[1] "iter 3"

From what has been shown so far, it appears that the solution with a FOR loop is most intuitive and easier to understand. One might wonder why we need to go through the hassle.

In the example below that is borrowed from https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting, let’s see how to get the job done with the FOR loop. First of all, we need to get things ready by converting the data.frame into a list with 2 data.frames named “lst” and defining a subsetting function named “fn”, similar to what we did before.

data(iris)
expr = expression(Sepal.Length > 7 & Sepal.Width > 3)
lst <- split(iris, sort((1:nrow(iris)) %% 2))
fn <- function(x) x[with(x, which(eval(expr))), ]

The code snippet below shows how to loop through the list by using the FOR loop and then subset each data.frame, which seems more complicated than how it should be.

LoopFn <- function(l) {
  result <- data.frame()
  for (i in l) {
    result <- rbind(result, fn(i))
  }
  row.names(result) <- NULL
  return(result)
}
LoopFn(lst)  
#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          7.2         3.6          6.1         2.5 virginica
#2          7.7         3.8          6.7         2.2 virginica
#3          7.2         3.2          6.0         1.8 virginica
#4          7.9         3.8          6.4         2.0 virginica

Let’s take a look at two other options, both of which requires only one line as long as the setting is configured appropriately.

do.call(rbind, c(lapply(lst, fn), make.row.names = F))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          7.2         3.6          6.1         2.5 virginica
#2          7.7         3.8          6.7         2.2 virginica
#3          7.2         3.2          6.0         1.8 virginica
#4          7.9         3.8          6.4         2.0 virginica
do.call(rbind, c((Vectorize(fn, SIMPLIFY = F)) (lst), make.row.names = F))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          7.2         3.6          6.1         2.5 virginica
#2          7.7         3.8          6.7         2.2 virginica
#3          7.2         3.2          6.0         1.8 virginica
#4          7.9         3.8          6.4         2.0 virginica

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)