# Pipe-friendly bootstrapping with list-variables in #rstats

**R – Strenge Jacke!**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few days ago, my package **sjstats** was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely `bootstrap()`

, `boot_ci()`

, `boot_se()`

and `boot_p()`

. In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.

First, load the required libraries:

library(dplyr) library(sjstats)

Now, init the sample data and fit a regular model. The model estimates how the dependency (*e42dep*) of an older person is related to the burden of care (*neg_c_7*) of a person who provides care to the frail older people:

data(efc) fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)

I demonstrate the `boot_ci()`

-function, so the confidence intervals are of interest here:

confint(fit) > 2.5 % 97.5 % > (Intercept) 5.3374378 7.7794162 > e42dep 1.2929451 1.7964296 > c161sex -0.1193198 0.9871336

Now let’s see to obtain bootstrapped confidence intervals for this model. First, the `bootstrap()`

-function generates bootstrap replicates and returns a data-frame with just one column, `$strap`

, which is a *list-variable* with bootstrap samples:

bootstrap(efc, 1000)

This is how the list-variable looks like:

# A tibble: 1,000 x 1 strap <list> 1 <data.frame [908 x 26]> 2 <data.frame [908 x 26]> 3 <data.frame [908 x 26]> 4 <data.frame [908 x 26]> 5 <data.frame [908 x 26]> 6 <data.frame [908 x 26]> 7 <data.frame [908 x 26]> 8 <data.frame [908 x 26]> 9 <data.frame [908 x 26]> 10 <data.frame [908 x 26]> # ... with 990 more rows

Since all data frames are saved in a list, you can use `lapply()`

to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named *models* in the example below). Then, using `lapply()`

again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated *e42dep*) for each „bootstrap“ model and save these coefficients in another variable (named *dependency* in the example below). Finally, we use the `boot_ci()`

-function to calculate confidence intervals of the bootstrapped coefficients.

The complete code looks like this:

efc %>% # generate bootstrape replicates, saved in # the list-variable 'strap' bootstrap(1000) %>% # run linear model on all bootstrap samples mutate(models = lapply(.$strap, function(x) { lm(neg_c_7 ~ e42dep + c161sex, data = x) })) %>% # extract coefficient for "e42dep" (dependency) variable mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>% # compute boostrapped confidence intervals boot_ci(dependency)

And the result (depending on your `seed()`

) is:

conf.low conf.high 1.303847 1.790724

The complete code, put together:

library(dplyr) library(sjstats) data(efc) fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc) confint(fit) efc %>% bootstrap(1000) %>% mutate(models = lapply(.$strap, function(x) { lm(neg_c_7 ~ e42dep + c161sex, data = x) })) %>% mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>% boot_ci(dependency)

Tagged: bootstrap, pipe, R, rstats

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Strenge Jacke!**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.