**R – Strenge Jacke!**, and kindly contributed to R-bloggers)

A few days ago, my package **sjstats** was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely `bootstrap()`

, `boot_ci()`

, `boot_se()`

and `boot_p()`

. In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.

First, load the required libraries:

library(dplyr) library(sjstats)

Now, init the sample data and fit a regular model. The model estimates how the dependency (*e42dep*) of an older person is related to the burden of care (*neg_c_7*) of a person who provides care to the frail older people:

data(efc) fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)

I demonstrate the `boot_ci()`

-function, so the confidence intervals are of interest here:

confint(fit) > 2.5 % 97.5 % > (Intercept) 5.3374378 7.7794162 > e42dep 1.2929451 1.7964296 > c161sex -0.1193198 0.9871336

Now let’s see to obtain bootstrapped confidence intervals for this model. First, the `bootstrap()`

-function generates bootstrap replicates and returns a data-frame with just one column, `$strap`

, which is a *list-variable* with bootstrap samples:

bootstrap(efc, 1000)

This is how the list-variable looks like:

# A tibble: 1,000 x 1 strap1

2 3 4 5 6 7 8 9 10 # ... with 990 more rows

Since all data frames are saved in a list, you can use `lapply()`

to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named *models* in the example below). Then, using `lapply()`

again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated *e42dep*) for each „bootstrap“ model and save these coefficients in another variable (named *dependency* in the example below). Finally, we use the `boot_ci()`

-function to calculate confidence intervals of the bootstrapped coefficients.

The complete code looks like this:

efc %>% # generate bootstrape replicates, saved in # the list-variable 'strap' bootstrap(1000) %>% # run linear model on all bootstrap samples mutate(models = lapply(.$strap, function(x) { lm(neg_c_7 ~ e42dep + c161sex, data = x) })) %>% # extract coefficient for "e42dep" (dependency) variable mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>% # compute boostrapped confidence intervals boot_ci(dependency)

And the result (depending on your `seed()`

) is:

conf.low conf.high 1.303847 1.790724

The complete code, put together:

library(dplyr) library(sjstats) data(efc) fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc) confint(fit) efc %>% bootstrap(1000) %>% mutate(models = lapply(.$strap, function(x) { lm(neg_c_7 ~ e42dep + c161sex, data = x) })) %>% mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>% boot_ci(dependency)

Tagged: bootstrap, pipe, R, rstats

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Strenge Jacke!**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...