**R for Public Health**, and kindly contributed to R-bloggers)

In the last post, I went over the basics of lists, including constructing, manipulating, and converting lists to other classes.

Knowing the basics, in this post, we’ll use the **apply()** functions to see just how powerful working with lists can be. I’ve done two posts on **apply()** for dataframes and matrics, here and here, so give those a read if you need a refresher.

#### Intro to apply-based functions for lists

There are a variety of **apply()**-based functions that can be used depending on what you want to do. The table below shows the function, what it inputs, and what it outputs:

Function | Input | Output |
---|---|---|

apply | matrix | vector or matrix |

sapply | vector or list | vector or matrix |

lapply | vector or list | list |

For example, if you have a list and you want to produce a vector (of the same length), use **sapply()**. If you have a vector and want to produce a list of the same length, use **lapply()**. Let’s try an example.

The syntax of **lapply()** is:

**lapply(INPUT, function(x) (Some function here))**

where INPUT, as we see from the table above, must be a vector or a list, and function(x) is any kind of function that takes**each element of the INPUT** and applies the function to it. The function can be something that already exists in R, or it can be a new function that you’ve written up.

For example, let’s construct a list of 3 vectors like so:

```
mylist<-list(x=c(1,5,7), y=c(4,2,6), z=c(0,3,4))
mylist
```

```
## $x
## [1] 1 5 7
##
## $y
## [1] 4 2 6
##
## $z
## [1] 0 3 4
```

and now we can use **lapply()** to find the mean of each element of the list (mean of each of the vectors x, y, and z), and output to a new list:

`lapply(mylist, function(x) mean(x))`

```
## $x
## [1] 4.333333
##
## $y
## [1] 4
##
## $z
## [1] 2.333333
```

But let’s say we wanted the result in a vector, not in a list, for whatever reason. Instead of doing the above and then converting the list into a vector (using unlist() or ldply() or whatever), we can do this directly using **sapply()** instead of**lapply()**. That’s because, as you can see in table, **sapply()** can take in a list as the input, and it will return a vector (or matrix). Let’s try it:

`sapply(mylist, function(x) mean(x))`

```
## x y z
## 4.333333 4.000000 2.333333
```

This is really great! Anytime you want to do the same thing over and over again, put all those things in a list and then use one of the apply functions. This reduces the need to run a loop, which can take a lot longer.

Let’s do another example where we write our own function this time:

```
#write function to find the span of numbers in a vector and check if it's larger than 5
span.fun<-function(x) {(max(x)-min(x))>=5}
#apply that function to the list
sapply(mylist, span.fun)
```

```
## x y z
## TRUE FALSE FALSE
```

#### Creating a list using lapply()

You don’t need to have a list already created to use **lapply()** – in fact, **lapply()** can be used to *make* a list. This is because the key about **lapply()** is that it *returns* a list of the same length as whatever you input.

For example, let’s initialize a list to have 2 empty matrices that are size 2×3. We’ll use **lapply()**: our input is just a vector containing 1 and 2, and the function we specify uses the **matrix()** function to construct a 2×3 matrix of empty cells for each element of this vector, so it returns a list of two such matrices.

If instead of empty matrices we wanted to fill these matrices with random numbers, we could do that too. Check out both possibilities below.

```
#initialize list to to 2 empty matrices of 2 by 3
list2<-lapply(1:2, function(x) matrix(NA, nrow=2, ncol=3))
list2
```

```
## [[1]]
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
##
## [[2]]
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
```

```
#initialize list to 2 matrices with random numbers from normal distribution
list2<-lapply(1:2, function(x) matrix(rnorm(6, 10, 1), nrow=2, ncol=3))
list2
```

```
## [[1]]
## [,1] [,2] [,3]
## [1,] 9.467982 9.794397 10.52168
## [2,] 10.022561 10.179758 10.47954
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 7.990455 10.95596 11.94031
## [2,] 8.952418 10.97080 11.24791
```

Again, we can use **lapply()** or **sapply()** on this newly created list to get the sum of each column of each matrix:

```
#input list, output column sums of each matrix into a new list
lapply(list2, colSums)
```

```
## [[1]]
## [1] 19.49054 19.97416 21.00121
##
## [[2]]
## [1] 16.94287 21.92676 23.18822
```

```
#input list, output column sums into a **vector** (which binds them into a matrix)
sapply(list2, colSums)
```

```
## [,1] [,2]
## [1,] 19.49054 16.94287
## [2,] 19.97416 21.92676
## [3,] 21.00121 23.18822
```

```
#instead of binding, we can stack these column sums by using tranpose function t():
t(sapply(list2, colSums))
```

```
## [,1] [,2] [,3]
## [1,] 19.49054 19.97416 21.00121
## [2,] 16.94287 21.92676 23.18822
```

#### Practical uses of lists using lapply()

Finally, what are lists good for? Often, I find a lists are great when I want to store multi-dimensional objects into one object, for example group a bunch of data.frames into a list, or store all my model results into one list. Here’s an example, where I run four linear models for four different outcomes. I want to store all my models into one object.

There are two ways to do this:

- Use a for() loop and insert the results of each iteration into the list
- Use lapply! Faster and less code

```
#create some data
set.seed(2000)
x=rbinom(1000,1,.6)
mydata<-data.frame(trt=x,
out1=x*3+rnorm(1000,0,3),
out2=x*5+rnorm(1000,0,3),
out3=rnorm(1000,5,3),
out4=x*1+rnorm(1000,0,8))
head(mydata)
```

```
## trt out1 out2 out3 out4
## 1 1 1.496148 5.2140842 7.8220283 12.7108382
## 2 0 -1.243485 0.5332667 2.8407921 4.6709677
## 3 1 11.070722 4.6477594 4.6725192 0.4216170
## 4 1 2.681000 1.8717883 0.3333281 0.4401036
## 5 0 -3.459300 0.8945582 3.1010555 -0.2620342
## 6 1 -2.266221 9.1754452 6.4914437 3.0443185
```

Now I want to run each of the four outcomes on the trt variable using linear regression and save the results. I’ll do this first as a loop, then using **lapply()**:

```
#1. Use a loop
#first, initialize the results list
results<-vector("list", 4)
#now use a loop for each outcome
for(i in 1:4){
results[[i]]<-lm(mydata[,i+1]~trt, data = mydata)
}
#2.Or, use lapply in one statement!
results<-lapply(2:5, function(x) lm(mydata[,x]~trt, data = mydata))
```

In the second case, we are taking the vector c(2,3,4,5) and for each component of this vector, we’re running the model that we describe in the function. We can always name the components of the list as below, and I’ll print out the first two elements:

```
names(results)<-names(mydata)[2:5]
print(results, max=2)
```

```
## $out1
##
## Call:
## lm(formula = mydata[, x] ~ trt, data = mydata)
##
## Coefficients:
## (Intercept) trt
## 0.1905 2.7707
##
##
## $out2
##
## Call:
## lm(formula = mydata[, x] ~ trt, data = mydata)
##
## Coefficients:
## (Intercept) trt
## -0.01892 4.73405
##
##
## [ reached getOption("max.print") -- omitted 2 entries ]
```

Why is this a great way to store data? Well, we can *keep* using the **apply()** functions, for example to put together all of the treatment effects for each outcome into one matrix:

```
#extract coefficient and std error for each outcome and store in a matrix
sapply(results, function(x) summary(x)$coefficients[2,1:2])
```

```
## out1 out2 out3 out4
## Estimate 2.7707490 4.7340543 -0.1344969 1.3293520
## Std. Error 0.1915748 0.1876549 0.1912755 0.5324664
```

You can also easily use other functions like **stargazer()** (previous post on this function here) to create a quick table of results like so (in latex code):

```
require(stargazer)
stargazer(results,
column.labels=names(results),
keep.stat=c("rsq","n"),
dep.var.labels="")
```

Or easily create a graph of the model estimates and 95% confidence intervals:

```
#extract coefficients from the list
coefs<-as.data.frame(t(sapply(results, function(x) summary(x)$coefficients[2,1:2])))
coefs
```

```
## Estimate Std. Error
## out1 2.7707490 0.1915748
## out2 4.7340543 0.1876549
## out3 -0.1344969 0.1912755
## out4 1.3293520 0.5324664
```

```
#add outcome columnn and change name of SE column
coefs$Outcome<-rownames(coefs)
names(coefs)[2]<-"SE"
#use ggplot to plot all the estimates
require(ggplot2)
ggplot(coefs, aes(Outcome,Estimate)) +
geom_point(size=4) +
theme(legend.position="none")+
labs(title="Treatment effect on outcomes", x="", y="Estimate and 95% CI")+
geom_errorbar(aes(ymin=Estimate-1.96*SE,ymax=Estimate+1.96*SE),width=0.1)+
geom_hline(yintercept = 0, color="red")+
coord_flip()
```

I hope that was useful! There are many great ways to use lists and the **apply()** functions to make your programming more efficient and less prone to errors.

For another great resource on using the **apply()** functions with lists, definitely check out this StackOverflow page.

**leave a comment**for the author, please follow the link and comment on their blog:

**R for Public Health**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...