Loops and Pizzas

October 19, 2018
By

(This article was first published on Marcelo S. Perlin, and kindly contributed to R-bloggers)

An Introduction to Loops in R –

Loops in R

First, if you are new to programming, you should know that loops are a
way to tell the computer that you want to repeat some operation for a
number of times. This is a very common task that can be found in many
programming languages. For example, let’s say you invited five friends
for dinner at your home and the whole cost of four pizzas will be split
evenly. Assume now that you must give instructions to a computer on
calculating how much each one will pay at the end of dinner. For that,
you need to sum up the individual tabs and divide by the number of
people. Your instructions to the computer could be: start with a value
of x=zero, take each individual pizza cost and sum it to x until all
costs are processed, dividing the result by the number of friends at the
end
.

The great thing about loops is that the length of it is dynamically
set. Using the previous example, if we had 500 friends (and a large
dinner table!), we could use the same instructions for calculating the
individual tabs. That means we can encapsulate a generic procedure for
processing any given number of friends at dinner. With it, you have at
your reach a tool for the execution of any sequential process. In other
words, you are the boss of your computer and, as long as you can write
it down clearly, you can set it to do any kind of repeated task for you.

Now, about the code, we could write the solution to the pizza problem
in R as:

pizza.costs <- c(50, 80, 30, 60) # each cost of pizza
n.friends <- 5 # number of friends

x <- 0 # set first cost to zero
for (i.cost in pizza.costs) {
  x <- x + i.cost # sum it up
}

x <- x/n.friends # divide for average per friend
print(x)

## [1] 44

Don’t worry if you didn’t understand the code. We’ll get to the
structure of a loop soon.

Back to our case, each friend would pay 44 for the meal. We can check
the result against function sum:

x == sum(pizza.costs)/n.friends

## [1] TRUE

The output TRUE shows that the results are equal.

The Structure of a Loop

Knowing how to use loops can be a powerful ally in a complex data
related problem. Let’s talk more about how loops are defined in R. The
structure of a loop in R follows:

for (i in i.vec){
  ...
}

In the previous code, command for indicates the beginning of a loop.
Object i in (i in i.vec) is the iterator of the loop. This
iterator will change its value in each iteration, taking each individual
value contained in i.vec. Note the loop is encapsulated by curly
braces ({}). These are important, as they define where the loop
starts and where it ends. The indentation (use of bigger margins) is
also important for visual cues, but not necessary. Consider the
following practical example:

# set seq
my.seq <- seq(-5,5)

# do loop
for (i in my.seq){
  cat(paste('\nThe value of i is',i))
}

## 
## The value of i is -5
## The value of i is -4
## The value of i is -3
## The value of i is -2
## The value of i is -1
## The value of i is 0
## The value of i is 1
## The value of i is 2
## The value of i is 3
## The value of i is 4
## The value of i is 5

In the code, we created a sequence from -5 to 5 and presented a text for
each element with the cat function. Notice how we also broke the
prompt line with '\n'. The loop starts with i=-5, execute command
cat(paste('\nThe value of i is', -5)), proceed to the next iteration
by setting i=-4, rerun the cat command, and so on. At its final
iteration, the value of i is 5.

The iterated sequence in the loop is not exclusive to numerical
vectors. Any type of vector or list may be used. See next:

# set char vec
my.char.vec <- letters[1:5]

# loop it!
for (i.char in my.char.vec){
  cat(paste('\nThe value of i.char is', i.char))
}

## 
## The value of i.char is a
## The value of i.char is b
## The value of i.char is c
## The value of i.char is d
## The value of i.char is e

The same goes for lists:

# set list
my.l <- list(x = 1:5, 
             y = c('abc','dfg'), 
             z = factor('A','B','C','D'))

# loop list
for (i.l in my.l){
  
  cat(paste0('\nThe class of i.l is ', class(i.l), '. '))
  cat(paste0('The number of elements is ', length(i.l), '.'))
  
}

## 
## The class of i.l is integer. The number of elements is 5.
## The class of i.l is character. The number of elements is 2.
## The class of i.l is factor. The number of elements is 1.

In the definition of loops, the iterator does not have to be the only
object incremented in each iteration. We can create other objects and
increment them using a simple sum operation. See next:

# set vec and iterators
my.vec <- seq(1:5)
my.x <- 5
my.z <- 10

for (i in my.vec){
  # iterate "manually"
  my.x <- my.x + 1
  my.z <- my.z + 2
  
  cat('\nValue of i = ', i, 
      ' | Value of my.x = ', my.x, 
      ' | Value of my.z = ', my.z)
}

## 
## Value of i =  1  | Value of my.x =  6  | Value of my.z =  12
## Value of i =  2  | Value of my.x =  7  | Value of my.z =  14
## Value of i =  3  | Value of my.x =  8  | Value of my.z =  16
## Value of i =  4  | Value of my.x =  9  | Value of my.z =  18
## Value of i =  5  | Value of my.x =  10  | Value of my.z =  20

Using nested loops, that is, a loop inside of another loop is also
possible. See the following example, where we present all the elements
of a matrix:

# set matrix
my.mat <- matrix(1:9, nrow = 3)

# loop all values of matrix
for (i in seq(1,nrow(my.mat))){
  for (j in seq(1,ncol(my.mat))){
    cat(paste0('\nElement [', i, ', ', j, '] = ', my.mat[i,j]))
  }
}

## 
## Element [1, 1] = 1
## Element [1, 2] = 4
## Element [1, 3] = 7
## Element [2, 1] = 2
## Element [2, 2] = 5
## Element [2, 3] = 8
## Element [3, 1] = 3
## Element [3, 2] = 6
## Element [3, 3] = 9

A Real World Example

Now, the computational needs of the real world is far more complex than
dividing a dinner expense. A practical example of using loops is
processing data according to groups. Using an example from Finance, if
we have a return dataset for several stocks and we want to calculate the
average return of each stock, we can use a loop for that. In this
example, we will use Yahoo Finance data from three stocks: FB, GE and
AA. The first step is downloading it with package BatchGetSymbols.

library(BatchGetSymbols)

## Loading required package: rvest

## Loading required package: xml2

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## 

my.tickers <-  c('FB', 'GE', 'AA')

df.stocks <- BatchGetSymbols(tickers = my.tickers, 
                             first.date = '2012-01-01', 
                             freq.data = 'yearly')[[2]]

## 
## Running BatchGetSymbols for:
##    tickers = FB, GE, AA
##    Downloading data for benchmark ticker | Found cache file
## FB | yahoo (1|3) | Found cache file - Good job!
## GE | yahoo (2|3) | Found cache file - Nice!
## AA | yahoo (3|3) | Found cache file - You got it!

It worked fine. Let’s check the contents of the dataframe:

dplyr::glimpse(df.stocks)

## Observations: 21
## Variables: 10
## $ ticker               "AA", "AA", "AA", "AA", "AA", "AA", "AA", ...
## $ ref.date             2012-01-03, 2013-01-02, 2014-01-02, 2015-...
## $ volume               2217410500, 2149575500, 2146821400, 268355...
## $ price.open           21.48282, 21.33864, 25.30359, 38.13561, 22...
## $ price.high           25.85628, 25.68807, 42.29280, 41.01921, 32...
## $ price.low            19.27206, 18.50310, 24.27030, 18.79146, 16...
## $ price.close          22.17969, 21.60297, 25.30359, 38.15964, 23...
## $ price.adjusted       20.89342, 20.62187, 24.48568, 37.24207, 23...
## $ ret.adjusted.prices  NA, -0.01299715, 0.18736494, 0.52097326, -...
## $ ret.closing.prices   NA, -0.02600212, 0.17130149, 0.50807215, -...

All financial data is there. Notice that the return series is available
at column ret.adjusted.prices.

Now we will use a loop to build a table with the mean return of each
stock:

# find unique tickers in column ticker
unique.tickers <- unique(df.stocks$ticker)

# create empty df
tab.out <- data.frame()

# loop tickers
for (i.ticker in unique.tickers){
  
  # create temp df with ticker i.ticker
  temp <- df.stocks[df.stocks$ticker==i.ticker, ]
  
  # row bind i.ticker and mean.ret
  tab.out <- rbind(tab.out, 
                   data.frame(ticker = i.ticker,
                              mean.ret = mean(temp$ret.adjusted.prices, na.rm = TRUE)))
  
}

# print result
print(tab.out)

##   ticker   mean.ret
## 1     AA 0.24663684
## 2     FB 0.35315566
## 3     GE 0.06784693

In the code, we used function unique to find out the names of all the
tickers in the dataset. Soon after, we create an empty dataframe to
save the results and a loop to filter the data of each stock
sequentially and average its returns. At the end of the loop, we use
function rbind to paste the results of each stock with the results of
the main table. As you can see, we can use the data to perform group
calculations with loop.

By now, I must be forward in saying that the previous loop is by no
means the best way of performing the data operation. What we just did by
loops is called a split-apply-combine procedure. There are base
function in R such as tapply, split and lapply/sapply that can
do the same job but with a more intuitive and functional approach. Going
further, functions from package tidyverse can do the same procuedure
with an even more intuitive approach. In a future post I shall discuss
this possibilities further.

I hope you guys liked the post. Got a question? Just drop it at the
comment section.

To leave a comment for the author, please follow the link and comment on their blog: Marcelo S. Perlin.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)