How to Avoid For Loop in R

September 15, 2018
By

(This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers)

A FOR loop is the most intuitive way to apply an operation to a series by looping through each item one by one, which makes perfect sense logically but should be avoided by useRs given the low efficiency. In R, there are two ways to implement the same functionality of a FOR loop. The first option is the lapply() or sapply() function that applies a function to each item in the list, which is very similar to the Map() function that I showed in https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting and https://statcompute.wordpress.com/2018/09/03/playing-map-and-reduce-in-r-by-group-calculation. The second option is to “vectorize” a function by using the Vectorize() function such that the newly vectorized function can consume the list directly.

Below is a quick demonstration showing how to recode a FOR loop by using lapply() and Vectorize() functions. We first created a dummy loop that iterates 3 times and then prints out itself.

for (i in 1:3) {print(paste("iter", i))}
#[1] "iter 1"
#[1] "iter 2"
#[1] "iter 3"

To migrate the above FOR loop, we just need to wrap the operation “print(paste(“iter”, i))” into an anonymous function and then to apply this anonymous function to each element in the series by using the lapply() function. Please note that the invisible() function used below doesn’t do anything material but suppress printing out the object value.

invisible(lapply(1:3, function(i) print(paste("iter", i))))
#[1] "iter 1"
#[1] "iter 2"
#[1] "iter 3"

The vectorization is a little tricky. It is noted that the anonymous function created above can only be applied to each item in the series. In order to have the anonymous function consuming the whole series instead of the single item, we should create a so-called vectorized function by using the Vectorize() function and then apply this newly created function to the series directly, as shown below.

invisible(Vectorize(function(i) print(paste("iter", i)), SIMPLIFY = F) (1:3))
#[1] "iter 1"
#[1] "iter 2"
#[1] "iter 3"

From what has been shown so far, it appears that the solution with a FOR loop is most intuitive and easier to understand. One might wonder why we need to go through the hassle.

In the example below that is borrowed from https://statcompute.wordpress.com/2018/09/08/playing-map-and-reduce-in-r-subsetting, let’s see how to get the job done with the FOR loop. First of all, we need to get things ready by converting the data.frame into a list with 2 data.frames named “lst” and defining a subsetting function named “fn”, similar to what we did before.

data(iris)
expr = expression(Sepal.Length > 7 & Sepal.Width > 3)
lst <- split(iris, sort((1:nrow(iris)) %% 2))
fn <- function(x) x[with(x, which(eval(expr))), ]

The code snippet below shows how to loop through the list by using the FOR loop and then subset each data.frame, which seems more complicated than how it should be.

LoopFn <- function(l) {
  result <- data.frame()
  for (i in l) {
    result <- rbind(result, fn(i))
  }
  row.names(result) <- NULL
  return(result)
}
LoopFn(lst)  
#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          7.2         3.6          6.1         2.5 virginica
#2          7.7         3.8          6.7         2.2 virginica
#3          7.2         3.2          6.0         1.8 virginica
#4          7.9         3.8          6.4         2.0 virginica

Let’s take a look at two other options, both of which requires only one line as long as the setting is configured appropriately.

do.call(rbind, c(lapply(lst, fn), make.row.names = F))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          7.2         3.6          6.1         2.5 virginica
#2          7.7         3.8          6.7         2.2 virginica
#3          7.2         3.2          6.0         1.8 virginica
#4          7.9         3.8          6.4         2.0 virginica
do.call(rbind, c((Vectorize(fn, SIMPLIFY = F)) (lst), make.row.names = F))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#1          7.2         3.6          6.1         2.5 virginica
#2          7.7         3.8          6.7         2.2 virginica
#3          7.2         3.2          6.0         1.8 virginica
#4          7.9         3.8          6.4         2.0 virginica

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)