Bootstrap Function in R: Resampling with the lapply and sample Functions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Bootstrap resampling is a powerful technique used in statistics and data analysis to estimate the uncertainty of a statistic by repeatedly sampling from the original data. In R, we can easily implement a bootstrap function using the lapply, rep, and sample functions. In this blog post, we will explore how to write a bootstrap function in R and provide an example using the “mpg” column from the popular “mtcars” dataset.
Bootstrap Function Implementation
To create a bootstrap function in R, we can follow these steps:
Step 1: Load the required dataset
Let’s begin by loading the “mtcars” dataset, which is included in the base R package:
data(mtcars)
Step 2: Define the bootstrap function
We’ll define a function called bootstrap()
that takes two arguments: data
(the input data vector) and n
(the number of bootstrap iterations).
bootstrap <- function(data, n) { resampled_data <- lapply(1:n, function(i) { resample <- sample(data, replace = TRUE) # Perform desired operations on the resampled data, e.g., compute a statistic # and return the result }) return(resampled_data) } bootstrapped_samples <- bootstrap(mtcars$mpg, 5) bootstrapped_samples
[[1]] [1] 21.0 18.1 33.9 21.4 17.3 19.2 19.2 15.8 16.4 30.4 18.1 14.3 32.4 10.4 15.0 [16] 16.4 30.4 17.8 21.4 19.2 17.3 22.8 14.3 22.8 30.4 18.7 13.3 13.3 15.2 10.4 [31] 15.0 13.3 [[2]] [1] 18.7 32.4 21.0 10.4 15.0 14.7 24.4 10.4 32.4 10.4 21.0 19.7 21.4 10.4 30.4 [16] 17.3 10.4 22.8 15.2 15.2 21.4 15.8 21.4 33.9 24.4 15.2 18.1 19.2 21.0 24.4 [31] 15.5 21.0 [[3]] [1] 15.5 30.4 21.0 22.8 27.3 18.1 21.0 13.3 15.2 17.3 15.8 21.0 18.1 14.3 17.8 [16] 15.8 21.0 18.1 19.2 24.4 19.2 22.8 18.7 14.3 26.0 21.4 22.8 32.4 14.7 15.2 [31] 15.2 14.3 [[4]] [1] 13.3 21.0 13.3 15.0 19.2 18.1 18.1 19.2 22.8 18.7 26.0 21.4 14.7 14.3 17.8 [16] 22.8 19.7 21.4 30.4 30.4 18.7 17.3 16.4 21.5 18.1 21.0 17.8 21.4 14.3 19.7 [31] 32.4 18.7 [[5]] [1] 15.0 21.4 21.5 26.0 17.3 30.4 18.1 17.8 17.3 30.4 24.4 32.4 21.0 17.8 33.9 [16] 32.4 19.2 22.8 19.7 16.4 17.8 22.8 14.3 33.9 21.5 10.4 21.4 26.0 33.9 14.7 [31] 21.5 18.1
In the above code, we use lapply
to generate a list of n
resampled datasets. Inside the lapply
function, we use the sample
function to randomly sample from the original data with replacement (replace = TRUE
). This ensures that each resampled dataset has the same length as the original dataset.
Step 3: Perform desired operations on resampled data
Within the lapply
function, you can perform any desired operations on the resampled data. This could involve calculating statistics, fitting models, or conducting hypothesis tests. Customize the code within the lapply
function to suit your specific needs.
Example: Bootstrapping the “mpg” column in mtcars: Let’s illustrate the usage of our bootstrap function by resampling the “mpg” column from the “mtcars” dataset. We will calculate the mean of the resampled datasets.
# Step 1: Load the dataset data(mtcars) # Step 2: Define the bootstrap function bootstrap <- function(data, n) { resampled_data <- lapply(1:n, function(i) { resample <- sample(data, replace = TRUE) mean(resample) # Calculate the mean of each resampled dataset }) return(resampled_data) } # Step 3: Perform the bootstrap resampling bootstrapped_means <- bootstrap(mtcars$mpg, n = 1000) # Display the first few resampled means head(bootstrapped_means)
[[1]] [1] 20.21562 [[2]] [1] 20.09375 [[3]] [1] 19.59375 [[4]] [1] 20.13437 [[5]] [1] 21.17813 [[6]] [1] 21.5375
In the above example, we resample the “mpg” column of the “mtcars” dataset 1000 times. The bootstrap()
function calculates the mean of each resampled dataset and returns a list of resampled means. The head()
function is then used to display the first few resampled means.
Of course we do not have to specify a statistic function in the bootstrap, we can choose to just return bootstrap samples and then perform some sort of statistic on it. Look at the following example using the above bootstrapped_samples
data.
quantile(unlist(bootstrapped_samples), probs = c(0.025, 0.25, 0.5, 0.75, 0.975))
2.5% 25% 50% 75% 97.5% 10.400 15.725 19.200 22.800 33.900
mean(unlist(bootstrapped_samples))
[1] 20.06625
sd(unlist(bootstrapped_samples))
[1] 5.827239
Conclusion
In this blog post, we have learned how to write a bootstrap function in R using the lapply
and sample
functions. By employing these functions, we can easily generate resampled datasets to estimate the uncertainty of statistics or perform other desired operations. The example using the “mpg” column of the “mtcars” dataset demonstrated the usage of the bootstrap function to calculate resampled means. Feel free to customize the function to suit your specific needs and explore the power of bootstrap resampling in R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.