Bootstrap Function in R: Resampling with the lapply and sample Functions

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Bootstrap resampling is a powerful technique used in statistics and data analysis to estimate the uncertainty of a statistic by repeatedly sampling from the original data. In R, we can easily implement a bootstrap function using the lapply, rep, and sample functions. In this blog post, we will explore how to write a bootstrap function in R and provide an example using the “mpg” column from the popular “mtcars” dataset.

Bootstrap Function Implementation

To create a bootstrap function in R, we can follow these steps:

Step 1: Load the required dataset

Let’s begin by loading the “mtcars” dataset, which is included in the base R package:

data(mtcars)

Step 2: Define the bootstrap function

We’ll define a function called bootstrap() that takes two arguments: data (the input data vector) and n (the number of bootstrap iterations).

bootstrap <- function(data, n) {
  resampled_data <- lapply(1:n, function(i) {
    resample <- sample(data, replace = TRUE)
    # Perform desired operations on the resampled data, e.g., compute a statistic
    # and return the result
  })
  return(resampled_data)
}

bootstrapped_samples <- bootstrap(mtcars$mpg, 5)
bootstrapped_samples
[[1]]
 [1] 21.0 18.1 33.9 21.4 17.3 19.2 19.2 15.8 16.4 30.4 18.1 14.3 32.4 10.4 15.0
[16] 16.4 30.4 17.8 21.4 19.2 17.3 22.8 14.3 22.8 30.4 18.7 13.3 13.3 15.2 10.4
[31] 15.0 13.3

[[2]]
 [1] 18.7 32.4 21.0 10.4 15.0 14.7 24.4 10.4 32.4 10.4 21.0 19.7 21.4 10.4 30.4
[16] 17.3 10.4 22.8 15.2 15.2 21.4 15.8 21.4 33.9 24.4 15.2 18.1 19.2 21.0 24.4
[31] 15.5 21.0

[[3]]
 [1] 15.5 30.4 21.0 22.8 27.3 18.1 21.0 13.3 15.2 17.3 15.8 21.0 18.1 14.3 17.8
[16] 15.8 21.0 18.1 19.2 24.4 19.2 22.8 18.7 14.3 26.0 21.4 22.8 32.4 14.7 15.2
[31] 15.2 14.3

[[4]]
 [1] 13.3 21.0 13.3 15.0 19.2 18.1 18.1 19.2 22.8 18.7 26.0 21.4 14.7 14.3 17.8
[16] 22.8 19.7 21.4 30.4 30.4 18.7 17.3 16.4 21.5 18.1 21.0 17.8 21.4 14.3 19.7
[31] 32.4 18.7

[[5]]
 [1] 15.0 21.4 21.5 26.0 17.3 30.4 18.1 17.8 17.3 30.4 24.4 32.4 21.0 17.8 33.9
[16] 32.4 19.2 22.8 19.7 16.4 17.8 22.8 14.3 33.9 21.5 10.4 21.4 26.0 33.9 14.7
[31] 21.5 18.1

In the above code, we use lapply to generate a list of n resampled datasets. Inside the lapply function, we use the sample function to randomly sample from the original data with replacement (replace = TRUE). This ensures that each resampled dataset has the same length as the original dataset.

Step 3: Perform desired operations on resampled data

Within the lapply function, you can perform any desired operations on the resampled data. This could involve calculating statistics, fitting models, or conducting hypothesis tests. Customize the code within the lapply function to suit your specific needs.

Example: Bootstrapping the “mpg” column in mtcars: Let’s illustrate the usage of our bootstrap function by resampling the “mpg” column from the “mtcars” dataset. We will calculate the mean of the resampled datasets.

# Step 1: Load the dataset
data(mtcars)

# Step 2: Define the bootstrap function
bootstrap <- function(data, n) {
  resampled_data <- lapply(1:n, function(i) {
    resample <- sample(data, replace = TRUE)
    mean(resample)  # Calculate the mean of each resampled dataset
  })
  return(resampled_data)
}

# Step 3: Perform the bootstrap resampling
bootstrapped_means <- bootstrap(mtcars$mpg, n = 1000)

# Display the first few resampled means
head(bootstrapped_means)
[[1]]
[1] 20.21562

[[2]]
[1] 20.09375

[[3]]
[1] 19.59375

[[4]]
[1] 20.13437

[[5]]
[1] 21.17813

[[6]]
[1] 21.5375

In the above example, we resample the “mpg” column of the “mtcars” dataset 1000 times. The bootstrap() function calculates the mean of each resampled dataset and returns a list of resampled means. The head() function is then used to display the first few resampled means.

Of course we do not have to specify a statistic function in the bootstrap, we can choose to just return bootstrap samples and then perform some sort of statistic on it. Look at the following example using the above bootstrapped_samples data.

quantile(unlist(bootstrapped_samples), 
         probs = c(0.025, 0.25, 0.5, 0.75, 0.975))
  2.5%    25%    50%    75%  97.5% 
10.400 15.725 19.200 22.800 33.900 
mean(unlist(bootstrapped_samples))
[1] 20.06625
sd(unlist(bootstrapped_samples))
[1] 5.827239

Conclusion

In this blog post, we have learned how to write a bootstrap function in R using the lapply and sample functions. By employing these functions, we can easily generate resampled datasets to estimate the uncertainty of statistics or perform other desired operations. The example using the “mpg” column of the “mtcars” dataset demonstrated the usage of the bootstrap function to calculate resampled means. Feel free to customize the function to suit your specific needs and explore the power of bootstrap resampling in R.

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)