How to use lists in R

[This article was first published on R for Public Health, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the last post, I went over the basics of lists, including constructing, manipulating, and converting lists to other classes. Knowing the basics, in this post, we’ll use the apply() functions to see just how powerful working with lists can be. I’ve done two posts on apply() for dataframes and matrics, here and here, so give those a read if you need a refresher.

Intro to apply-based functions for lists

There are a variety of apply()-based functions that can be used depending on what you want to do. The table below shows the function, what it inputs, and what it outputs:
Function Input Output
apply matrix vector or matrix
sapply vector or list vector or matrix
lapply vector or list list
For example, if you have a list and you want to produce a vector (of the same length), use sapply(). If you have a vector and want to produce a list of the same length, use lapply(). Let’s try an example. The syntax of lapply() is: lapply(INPUT, function(x) (Some function here)) where INPUT, as we see from the table above, must be a vector or a list, and function(x) is any kind of function that takeseach element of the INPUT and applies the function to it. The function can be something that already exists in R, or it can be a new function that you’ve written up. For example, let’s construct a list of 3 vectors like so:
mylist<-list(x=c(1,5,7), y=c(4,2,6), z=c(0,3,4))
mylist
## $x
## [1] 1 5 7
## 
## $y
## [1] 4 2 6
## 
## $z
## [1] 0 3 4
and now we can use lapply() to find the mean of each element of the list (mean of each of the vectors x, y, and z), and output to a new list:
lapply(mylist, function(x) mean(x))
## $x
## [1] 4.333333
## 
## $y
## [1] 4
## 
## $z
## [1] 2.333333
But let’s say we wanted the result in a vector, not in a list, for whatever reason. Instead of doing the above and then converting the list into a vector (using unlist() or ldply() or whatever), we can do this directly using sapply() instead oflapply(). That’s because, as you can see in table, sapply() can take in a list as the input, and it will return a vector (or matrix). Let’s try it:
sapply(mylist, function(x) mean(x))
##        x        y        z 
## 4.333333 4.000000 2.333333
This is really great! Anytime you want to do the same thing over and over again, put all those things in a list and then use one of the apply functions. This reduces the need to run a loop, which can take a lot longer. Let’s do another example where we write our own function this time:
#write function to find the span of numbers in a vector and check if it's larger than 5
span.fun<-function(x) {(max(x)-min(x))>=5}

#apply that function to the list
sapply(mylist, span.fun)
##     x     y     z 
##  TRUE FALSE FALSE

Creating a list using lapply()

You don’t need to have a list already created to use lapply() - in fact, lapply() can be used to make a list. This is because the key about lapply() is that it returns a list of the same length as whatever you input. For example, let’s initialize a list to have 2 empty matrices that are size 2x3. We’ll use lapply(): our input is just a vector containing 1 and 2, and the function we specify uses the matrix() function to construct a 2x3 matrix of empty cells for each element of this vector, so it returns a list of two such matrices. If instead of empty matrices we wanted to fill these matrices with random numbers, we could do that too. Check out both possibilities below.
#initialize list to to 2 empty matrices of 2 by 3
list2<-lapply(1:2, function(x) matrix(NA, nrow=2, ncol=3))
list2
## [[1]]
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA
#initialize list to 2 matrices with random numbers from normal distribution
list2<-lapply(1:2, function(x) matrix(rnorm(6, 10, 1), nrow=2, ncol=3))
list2
## [[1]]
##           [,1]      [,2]     [,3]
## [1,]  9.467982  9.794397 10.52168
## [2,] 10.022561 10.179758 10.47954
## 
## [[2]]
##          [,1]     [,2]     [,3]
## [1,] 7.990455 10.95596 11.94031
## [2,] 8.952418 10.97080 11.24791
Again, we can use lapply() or sapply() on this newly created list to get the sum of each column of each matrix:
#input list, output column sums of each matrix into a new list
lapply(list2, colSums)
## [[1]]
## [1] 19.49054 19.97416 21.00121
## 
## [[2]]
## [1] 16.94287 21.92676 23.18822
#input list, output column sums into a **vector** (which binds them into a matrix)
sapply(list2, colSums)
##          [,1]     [,2]
## [1,] 19.49054 16.94287
## [2,] 19.97416 21.92676
## [3,] 21.00121 23.18822
#instead of binding, we can stack these column sums by using tranpose function t():
t(sapply(list2, colSums))
##          [,1]     [,2]     [,3]
## [1,] 19.49054 19.97416 21.00121
## [2,] 16.94287 21.92676 23.18822

Practical uses of lists using lapply()

Finally, what are lists good for? Often, I find a lists are great when I want to store multi-dimensional objects into one object, for example group a bunch of data.frames into a list, or store all my model results into one list. Here’s an example, where I run four linear models for four different outcomes. I want to store all my models into one object. There are two ways to do this:
  • Use a for() loop and insert the results of each iteration into the list
  • Use lapply! Faster and less code
#create some data
set.seed(2000)
x=rbinom(1000,1,.6)
mydata<-data.frame(trt=x,
                   out1=x*3+rnorm(1000,0,3),
                   out2=x*5+rnorm(1000,0,3),
                   out3=rnorm(1000,5,3),
                   out4=x*1+rnorm(1000,0,8))

head(mydata)
##   trt      out1      out2      out3       out4
## 1   1  1.496148 5.2140842 7.8220283 12.7108382
## 2   0 -1.243485 0.5332667 2.8407921  4.6709677
## 3   1 11.070722 4.6477594 4.6725192  0.4216170
## 4   1  2.681000 1.8717883 0.3333281  0.4401036
## 5   0 -3.459300 0.8945582 3.1010555 -0.2620342
## 6   1 -2.266221 9.1754452 6.4914437  3.0443185
Now I want to run each of the four outcomes on the trt variable using linear regression and save the results. I’ll do this first as a loop, then using lapply():
#1. Use a loop
#first, initialize the results list
results<-vector("list", 4) 

#now use a loop for each outcome
for(i in 1:4){
  results[[i]]<-lm(mydata[,i+1]~trt, data = mydata) 
}


#2.Or, use lapply in one statement!
results<-lapply(2:5, function(x) lm(mydata[,x]~trt, data = mydata))
In the second case, we are taking the vector c(2,3,4,5) and for each component of this vector, we’re running the model that we describe in the function. We can always name the components of the list as below, and I’ll print out the first two elements:
names(results)<-names(mydata)[2:5]
print(results, max=2)
## $out1
## 
## Call:
## lm(formula = mydata[, x] ~ trt, data = mydata)
## 
## Coefficients:
## (Intercept)          trt  
##      0.1905       2.7707  
## 
## 
## $out2
## 
## Call:
## lm(formula = mydata[, x] ~ trt, data = mydata)
## 
## Coefficients:
## (Intercept)          trt  
##    -0.01892      4.73405  
## 
## 
##  [ reached getOption("max.print") -- omitted 2 entries ]
Why is this a great way to store data? Well, we can keep using the apply() functions, for example to put together all of the treatment effects for each outcome into one matrix:
#extract coefficient and std error for each outcome and store in a matrix
sapply(results, function(x) summary(x)$coefficients[2,1:2])
##                 out1      out2       out3      out4
## Estimate   2.7707490 4.7340543 -0.1344969 1.3293520
## Std. Error 0.1915748 0.1876549  0.1912755 0.5324664
You can also easily use other functions like stargazer() (previous post on this function here) to create a quick table of results like so (in latex code):
require(stargazer)
stargazer(results, 
          column.labels=names(results),
          keep.stat=c("rsq","n"),
          dep.var.labels="")
  Or easily create a graph of the model estimates and 95% confidence intervals:
#extract coefficients from the list
coefs<-as.data.frame(t(sapply(results, function(x) summary(x)$coefficients[2,1:2])))
coefs
##        Estimate Std. Error
## out1  2.7707490  0.1915748
## out2  4.7340543  0.1876549
## out3 -0.1344969  0.1912755
## out4  1.3293520  0.5324664
#add outcome columnn and change name of SE column
coefs$Outcome<-rownames(coefs)
names(coefs)[2]<-"SE"

#use ggplot to plot all the estimates
require(ggplot2)
ggplot(coefs, aes(Outcome,Estimate)) +
  geom_point(size=4) + 
  theme(legend.position="none")+
  labs(title="Treatment effect on outcomes", x="", y="Estimate and 95% CI")+
  geom_errorbar(aes(ymin=Estimate-1.96*SE,ymax=Estimate+1.96*SE),width=0.1)+
  geom_hline(yintercept = 0, color="red")+
  coord_flip()
  I hope that was useful! There are many great ways to use lists and the apply() functions to make your programming more efficient and less prone to errors. For another great resource on using the apply() functions with lists, definitely check out this StackOverflow page.

To leave a comment for the author, please follow the link and comment on their blog: R for Public Health.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)