# Loops and Pizzas

**R on msperlin**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Loops in R

First, if you are new to programming, you should know that loops are a way to tell the computer that you want to repeat some operation for a number of times. This is a very common task that can be found in many programming languages. For example, let’s say you invited five friends for dinner at your home and the whole cost of four pizzas will be split evenly. Assume now that you **must** give instructions to a computer on calculating how much each one will pay at the end of dinner. For that, you need to sum up the individual tabs and divide by the number of people. Your instructions to the computer could be: *start with a value of x = zero, take each individual pizza cost and sum it to x until all costs are processed, dividing the result by the number of friends at the end*.

The great thing about *loops* is that the length of it is dynamically set. Using the previous example, if we had 500 friends (and a large dinner table!), we could use the same instructions for calculating the individual tabs. That means we can encapsulate a generic procedure for processing any given number of friends at dinner. With it, you have at your reach a tool for the execution of any sequential process. In other words, you are the boss of your computer and, as long as you can write it down clearly, you can set it to do any kind of repeated task for you.

Now, about the code, we could write the solution to the *pizza problem* in R as:

pizza.costs <- c(50, 80, 30, 60) # each cost of pizza n.friends <- 5 # number of friends x <- 0 # set first cost to zero for (i.cost in pizza.costs) { x <- x + i.cost # sum it up } x <- x/n.friends # divide for average per friend print(x) ## [1] 44

Don’t worry if you didn’t understand the code. We’ll get to the structure of a loop soon.

Back to our case, each friend would pay 44 for the meal. We can check the result against function `sum`

:

x == sum(pizza.costs)/n.friends ## [1] TRUE

The output `TRUE`

shows that the results are equal.

## The Structure of a Loop

Knowing how to use loops can be a powerful ally in a complex data related problem. Let’s talk more about how *loops* are defined in R. The structure of a *loop* in R follows:

for (i in i.vec){ ... }

In the previous code, command `for`

indicates the beginning of a *loop*. Object `i`

in `(i in i.vec)`

is the iterator of the *loop*. This iterator will change its value in each iteration, taking each individual value contained in `i.vec`

. Note the *loop* is encapsulated by curly braces (`{}`

). These are important, as they define where the *loop* starts and where it ends. The indentation (use of bigger margins) is also important for visual cues, but not necessary. Consider the following practical example:

# set seq my.seq <- seq(-5,5) # do loop for (i in my.seq){ cat(paste('\nThe value of i is',i)) } ## ## The value of i is -5 ## The value of i is -4 ## The value of i is -3 ## The value of i is -2 ## The value of i is -1 ## The value of i is 0 ## The value of i is 1 ## The value of i is 2 ## The value of i is 3 ## The value of i is 4 ## The value of i is 5

In the code, we created a sequence from -5 to 5 and presented a text for each element with the `cat`

function. Notice how we also broke the prompt line with `'\n'`

. The *loop* starts with `i=-5`

, execute command `cat(paste('\nThe value of i is', -5))`

, proceed to the next iteration by setting `i=-4`

, rerun the `cat`

command, and so on. At its final iteration, the value of `i`

is `5`

.

The iterated sequence in the *loop* is not exclusive to numerical vectors. Any type of vector or list may be used. See next:

# set char vec my.char.vec <- letters[1:5] # loop it! for (i.char in my.char.vec){ cat(paste('\nThe value of i.char is', i.char)) } ## ## The value of i.char is a ## The value of i.char is b ## The value of i.char is c ## The value of i.char is d ## The value of i.char is e

The same goes for `lists`

:

# set list my.l <- list(x = 1:5, y = c('abc','dfg'), z = factor('A','B','C','D')) # loop list for (i.l in my.l){ cat(paste0('\nThe class of i.l is ', class(i.l), '. ')) cat(paste0('The number of elements is ', length(i.l), '.')) } ## ## The class of i.l is integer. The number of elements is 5. ## The class of i.l is character. The number of elements is 2. ## The class of i.l is factor. The number of elements is 1.

In the definition of *loops*, the iterator does not have to be the only object incremented in each iteration. We can create other objects and increment them using a simple sum operation. See next:

# set vec and iterators my.vec <- seq(1:5) my.x <- 5 my.z <- 10 for (i in my.vec){ # iterate "manually" my.x <- my.x + 1 my.z <- my.z + 2 cat('\nValue of i = ', i, ' | Value of my.x = ', my.x, ' | Value of my.z = ', my.z) } ## ## Value of i = 1 | Value of my.x = 6 | Value of my.z = 12 ## Value of i = 2 | Value of my.x = 7 | Value of my.z = 14 ## Value of i = 3 | Value of my.x = 8 | Value of my.z = 16 ## Value of i = 4 | Value of my.x = 9 | Value of my.z = 18 ## Value of i = 5 | Value of my.x = 10 | Value of my.z = 20

Using nested *loops*, that is, a *loop* inside of another *loop* is also possible. See the following example, where we present all the elements of a matrix:

# set matrix my.mat <- matrix(1:9, nrow = 3) # loop all values of matrix for (i in seq(1,nrow(my.mat))){ for (j in seq(1,ncol(my.mat))){ cat(paste0('\nElement [', i, ', ', j, '] = ', my.mat[i,j])) } } ## ## Element [1, 1] = 1 ## Element [1, 2] = 4 ## Element [1, 3] = 7 ## Element [2, 1] = 2 ## Element [2, 2] = 5 ## Element [2, 3] = 8 ## Element [3, 1] = 3 ## Element [3, 2] = 6 ## Element [3, 3] = 9

## A Real World Example

Now, the computational needs of the real world is far more complex than dividing a dinner expense. A practical example of using *loops* is processing data according to groups. Using an example from Finance, if we have a return dataset for several stocks and we want to calculate the average return of each stock, we can use a *loop* for that. In this example, we will use *Yahoo Finance* data from three stocks: FB, GE and AA. The first step is downloading it with package `BatchGetSymbols`

.

library(BatchGetSymbols) ## Loading required package: rvest ## Loading required package: xml2 ## Loading required package: dplyr ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union ## my.tickers <- c('FB', 'GE', 'AA') df.stocks <- BatchGetSymbols(tickers = my.tickers, first.date = '2012-01-01', freq.data = 'yearly')[[2]] ## ## Running BatchGetSymbols for: ## tickers = FB, GE, AA ## Downloading data for benchmark ticker | Not Cached ## FB | yahoo (1|3) | Not Cached - Well done! ## GE | yahoo (2|3) | Not Cached - Got it! ## AA | yahoo (3|3) | Not Cached - You got it!

It worked fine. Let’s check the contents of the dataframe:

dplyr::glimpse(df.stocks) ## Observations: 21 ## Variables: 10 ## $ ticker <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", ... ## $ ref.date <date> 2012-01-03, 2013-01-02, 2014-01-02, 2015-... ## $ volume <dbl> 2217410500, 2149575500, 2146821400, 268355... ## $ price.open <dbl> 21.48282, 21.33864, 25.30359, 38.13561, 22... ## $ price.high <dbl> 25.85628, 25.68807, 42.29280, 41.01921, 32... ## $ price.low <dbl> 19.27206, 18.50310, 24.27030, 18.79146, 16... ## $ price.close <dbl> 22.17969, 21.60297, 25.30359, 38.15964, 23... ## $ price.adjusted <dbl> 20.89342, 20.62187, 24.48568, 37.24207, 23... ## $ ret.adjusted.prices <dbl> NA, -0.01299715, 0.18736494, 0.52097326, -... ## $ ret.closing.prices <dbl> NA, -0.02600212, 0.17130149, 0.50807215, -...

All financial data is there. Notice that the return series is available at column ret.adjusted.prices.

Now we will use a loop to build a table with the mean return of each stock:

# find unique tickers in column ticker unique.tickers <- unique(df.stocks$ticker) # create empty df tab.out <- data.frame() # loop tickers for (i.ticker in unique.tickers){ # create temp df with ticker i.ticker temp <- df.stocks[df.stocks$ticker==i.ticker, ] # row bind i.ticker and mean.ret tab.out <- rbind(tab.out, data.frame(ticker = i.ticker, mean.ret = mean(temp$ret.adjusted.prices, na.rm = TRUE))) } # print result print(tab.out) ## ticker mean.ret ## 1 AA 0.24663684 ## 2 FB 0.35315566 ## 3 GE 0.06784693

In the code, we used function `unique`

to find out the names of all the tickers in the dataset. Soon after, we create an empty *dataframe* to save the results and a loop to filter the data of each stock sequentially and average its returns. At the end of the *loop*, we use function `rbind`

to paste the results of each stock with the results of the main table. As you can see, we can use the data to perform group calculations with *loop*.

By now, I must be forward in saying that the previous loop is by no means the best way of performing the data operation. What we just did by loops is called a *split-apply-combine* procedure. There are base function in R such as `tapply`

, `split`

and `lapply`

/`sapply`

that can do the same job but with a more intuitive and functional approach. Going further, functions from package `tidyverse`

can do the same procuedure with an even more intuitive approach. In a future post I shall discuss this possibilities further.

I hope you guys liked the post. Got a question? Just drop it at the comment section.

**leave a comment**for the author, please follow the link and comment on their blog:

**R on msperlin**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.