How and When to Use For Loops

[This article was first published on Blog on Data Solutions | Dedicated to helping businesses making data-driven decisions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What is a for loop?

A for loop is a process that lets you automate a portion of your code by repeating a sequence of instructions a specified number of times. There are a couple of different types of loops (while, for, repeat) and they all have the same structure and purpose.

When and how do you use it?

Are you re-writing the same line of code but changing one thing each time? You should be using a for loop. They are made for repeating the same function or (set of functions) to an entire vector, matrix, or even list. The structure of a for loop is fairly simple, it contains a looping index, an object to loop through, and the actions you want to perform.

for(variable in vector){
expression
}

There are 2 ways to write the looping variable, they each have advantages. The first way is to say for (name in vector). The second way is to use a number index such as for (i in 1:length(vector)). The first way is easy to read and simple but it doesn’t give you an index value. The second one is more writing and not as easy to read sometimes but it gives us more control and options in the expression portion. The second way is more commonly used, especially when you need to subset or index from another vector.

Method 1

set.seed(10)
df <- data.frame(Name = c("A", "B", "C", "D", "E", "F", "G"), 
             Age = seq(0,6,1),
             Month = sample(seq(1,12), 7, replace = T),
             Length.in = round(seq(3,20,length.out = 7),2))
             
for(a in df$Length.in){
print(a)

}

#Calculating length in inches using name method of writing loop
lengths.cm <- c()
for(i in df$Length.in){
  lengths.cm[i] <- i*2.54
  print(lengths.cm)
}

Method 2

#Calculating length in inches using indexing method of writing loop
lengths.cm <- c()
for(i in 1:nrow(df)){
  lengths.cm[i] <- df$Length.in[i]*2.54
  df$Length.cm[i] <- lengths.cm[i]
  print(lengths.cm)
}

#To find the difference in lengths between each fish
for(i in 1:nrow(df)-1){
  print(df$Length.in[i+1] - df$Length.in[i])
}

Control Statements

There are two control statements you can use in loops, break and next. These are conditional and tell the loop to either stop or skip over an element and continue onto the next one if a certain condition is met.

#Use the next control statement to write a loop that adds another year to the age if the fish was caught after June (Month > 6).
for (i in 1:nrow(df)) {
  if(df[i,3] < 6){
    next
  }else df$Age[i] <- df$Age[i] + 1
}


#Use the break control statement to subset the fish younger than 4.
fish.sub <- df[FALSE,]
for (i in 1:nrow(df)) {
  if(df[i,2] > 3){
    break
  }else fish.sub[i,] <- df[i,]
}

Using for loops to plot

If you need to create several plots of the same thing or one plot with many lines of different values it can be useful to use a for loop. The strucutre is pretty much the same as normal loop but you make a blank plot first.

library(RColorBrewer)
col. = brewer.pal(5, "Set2")
plot(0, 0, type = "n", xlab = "Age", ylab = "Length", xlim = c(0,max(df$Age)), ylim = c(0,max(df$Length.cm)))
for(i in 4:5){
   points(x = df$Age, y = df[,i], pch = 16, col = col.[i])
}

par(mfrow = c(1,2))
for (i in 4:5){
  plot(x = df$Age, y = df[,i], pch = 16, col = col.[i], xlab = "Age", ylab = "Length")
}

Nested loops

Loops can be nested, with two or more for loops within each other. Nested loops are useful when dealing with matrices. The example below is setting each element in the matrix based on its position in the matrix.

mymat = matrix(nrow=10, ncol=10) # create a 10 x 10 matrix (of 10 rows and 10 columns)

for(i in 1:dim(mymat)[1])  # for each row
{
  for(j in 1:dim(mymat)[2]) # for each column
  {
    mymat[i,j] = i*j     # assign values based on position: product of two indexes
  }
}

Alternatives to for loops

An alternative to for loops is the apply family. It works with vector form and can be faster and easier to write than for loops. There are four main functions:

  • apply(): works on matrices
  • lapply(): works across lists and vectors
  • sapply(): simplifies output into a vector
  • mapply(): pass multiple variables or function arguments

The apply function takes 3 arguments, the object to be manipulated, the margin over which to apply the function (do you want to do it by rows(1) or by columns (2)), and the function you want to do. The function can be a built in one or one that you write. If the function you are using has additional arguments, you put that after the function. The structure looks like this: apply(mymat, 2, sum, na.rm = TRUE). This would calculate the sum of each column in the matrix and remove NA’s.

To leave a comment for the author, please follow the link and comment on their blog: Blog on Data Solutions | Dedicated to helping businesses making data-driven decisions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)