Conquering R’s Apply Family: Your Guide to apply(), lapply(), sapply(), and tapply()

Posted on February 14, 2024 by Steven P. Sanderson II, MPH in R bloggers | 0 Comments

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Welcome, fellow R warriors! Today, we delve into the heart of vectorized operations with R’s “apply” family: apply(), lapply(), sapply(), and tapply(). These functions are your secret weapons for efficiency and elegance, so buckle up and prepare to be amazed!

But first, the “why”: Loops are great, but for repetitive tasks on data structures, vectorization reigns supreme. It’s faster, cleaner, and lets you focus on the “what” instead of the “how” of your analysis. Enter the apply family, each member offering a unique twist on applying functions to your data.

Examples

Example 1. The Grandparent: apply()

Think of apply() as the customizable grandfather. It takes three arguments:

X: Your data (matrix, array, data frame).
MARGIN: Where to apply the function (rows = 1, columns = 2, both = c(1, 2)).
FUN: The function to apply (e.g., mean, sum, your custom function).

Calculate the mean of each column in the iris dataset:

column_means <- apply(iris[, 1:4], 2, mean)
print(column_means)

Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333

Explanation: We apply mean (FUN) to each column (MARGIN = 2) of the first four columns (iris[, 1:4]) of the iris data frame, storing the results in column_means.

Example 2. The Speedy Sibling: lapply()

lapply() is the speed demon, applying a function to each element of a list or vector and returning a list of results.

Calculate the median of each petal length in a list of lists:

petal_lengths <- list(c(1.4, 1.5, 1.6), c(4.4, 4.5, 4.6))
petal_medians <- lapply(petal_lengths, median)
print(petal_medians)

[[1]]
[1] 1.5

[[2]]
[1] 4.5

Explanation: We apply median to each sub-list in petal_lengths, returning a list (petal_medians) containing the medians.

Example 3. The Streamlined Cousin: sapply()

sapply() is like lapply(), but it tries to simplify the output. If all results are of the same type (e.g., numeric), it returns a vector instead of a list.

Find the minimum value in each row of a matrix:

matrix <- matrix(1:12, nrow = 3)
row_mins <- sapply(1:nrow(matrix), function(i) min(matrix[i, ]))
print(row_mins)

[1] 1 2 3

Explanation: We use an anonymous function to find the minimum in each row (matrix[i, ]) and apply it to each row number (1:nrow(matrix)). sapply() simplifies the output to a vector of minimum values (row_mins).

Example 4. The Grouping Guru: tapply()

tapply() groups data based on another variable and applies a function to each group. Perfect for summarizing data by categories!

Calculate the average sepal length for each species in the iris dataset:

sepal_length_by_species <- tapply(iris$Sepal.Length, iris$Species, mean)
print(sepal_length_by_species)

    setosa versicolor  virginica 
     5.006      5.936      6.588

Explanation: We group the Sepal.Length column by the Species column (using iris$Species) and calculate the mean (mean) for each group. The results are stored in sepal_length_by_species.

Ready to Experiment?

These are just a taste of the apply family’s power. Now it’s your turn! Explore different functions, data structures, and margins to see how these tools can streamline your R workflow. Remember, practice makes perfect, so dive in, experiment, and conquer the world of vectorized operations!

Bonus Tip: Check out the purrr package for even more apply-like functions with a functional programming twist!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Conquering R’s Apply Family: Your Guide to apply(), lapply(), sapply(), and tapply()

Introduction

Examples

Example 1. The Grandparent: apply()

Example 2. The Speedy Sibling: lapply()

Example 3. The Streamlined Cousin: sapply()

Example 4. The Grouping Guru: tapply()

Ready to Experiment?

Related

Introduction

Examples

Example 1. The Grandparent: apply()

Example 2. The Speedy Sibling: lapply()

Example 3. The Streamlined Cousin: sapply()

Example 4. The Grouping Guru: tapply()

Ready to Experiment?

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)