do.call / lapply
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
sha256 1 2074954df14cc65b017b3e9d4b291353151672d450f2b623acc2a5d253767e42
Use case
The use of the do.call / lapply combination is a powerful way to leverage functional programming in R. In short, you write a function that performs some actions and apply it to a list of inputs, which can then be fed into a function that combines everything into a single object.
Let us take an example, where we would like to calculate the ichimoku clouds for a portfolio of stocks, but also preserve the volume data, all in one tidy object.
Setup
We could set it up as per the below:
- tickers: a vector defining the stock symbols in our portfolio
- process: a function that generates a row in a data frame or matrix
library(ichimoku) tickers <- c("C", "MS", "JPM", "GS") process <- function(x) { # Use the 'quantmod' package to download pricing data pxdata <- quantmod::getSymbols(x, from = "2020-04-15", to = "2021-05-27", auto.assign = FALSE) # Extract volume column volume <- pxdata[, grep("Volume", colnames(pxdata))] # Calculate the cloud by calling ichimoku() from the 'ichimoku' package cloud <- ichimoku(pxdata, ticker = x) # Return a list of ticker, ichimoku cloud object, volume data list(x, cloud, volume) }
We now want to apply our function to each element of ‘tickers’ in turn, and then for the results to be combined.
Loop
One way to achieve this would be to iterate over ‘tickers’ using a loop:
# Define a list to contain the loop output, specifying the length in advance as good practice out <- vector(mode = "list", length = length(tickers)) # Loop over each element in 'tickers' and save in pre-defined list for(i in seq_along(tickers)) out[[i]] <- process(tickers[i]) # Create output matrix by calling rbind on each element of the list portfolio <- do.call(rbind, out) portfolio [,1] [,2] [,3] [1,] "C" ichimoku,11 xts,282 [2,] "MS" ichimoku,11 xts,282 [3,] "JPM" ichimoku,11 xts,282 [4,] "GS" ichimoku,11 xts,282
This approach takes 3 lines of code.
Furthermore, ‘i’ and ‘out’ remain as leftover objects in the global environment.
Somewhat messy.
do.call / lapply
Instead we can use a do.call / lapply combination to achieve the same result in one line:
portfolio <- do.call(rbind, lapply(tickers, process)) portfolio [,1] [,2] [,3] [1,] "C" ichimoku,11 xts,282 [2,] "MS" ichimoku,11 xts,282 [3,] "JPM" ichimoku,11 xts,282 [4,] "GS" ichimoku,11 xts,282
There are also no intermediate objects generated that clutter the global environment.
To explain:
First lapply applies to a list or list-like object (‘tickers’), a function (‘process’). lapply always returns a list.
This can then be fed into do.call, which calls a function (‘rbind’) on a list of arguments (the output of lapply i.e. the lists returned by ‘process’).
The use of do.call / lapply provides for a far more succinct and distinctive coding style.
Tidy data output
portfolio [,1] [,2] [,3] [1,] "C" ichimoku,11 xts,282 [2,] "MS" ichimoku,11 xts,282 [3,] "JPM" ichimoku,11 xts,282 [4,] "GS" ichimoku,11 xts,282
‘portfolio’ is a tidy matrix with a row for each ticker, and a column for each data type.
We can easily access any element of the matrix by specifying its index value, for example the ichimoku cloud for MS by [2,2]:
plot(portfolio[2,2][[1]])
Further examples: Youngju Nielsen of Sungkyunkwan University uses do.call / lapply to good effect in her course https://www.coursera.org/learn/the-fundamental-of-data-driven-investment/↩︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.