[This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few days ago, my package sjstats was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely `bootstrap()`, `boot_ci()`, `boot_se()` and `boot_p()`. In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.

First, load the required libraries:

```library(dplyr)
library(sjstats)```

Now, init the sample data and fit a regular model. The model estimates how the dependency (e42dep) of an older person is related to the burden of care (neg_c_7) of a person who provides care to the frail older people:

```data(efc)
fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)```

I demonstrate the `boot_ci()`-function, so the confidence intervals are of interest here:

```confint(fit)

>                  2.5 %    97.5 %
> (Intercept)  5.3374378 7.7794162
> e42dep       1.2929451 1.7964296
> c161sex     -0.1193198 0.9871336```

Now let’s see to obtain bootstrapped confidence intervals for this model. First, the `bootstrap()`-function generates bootstrap replicates and returns a data-frame with just one column, `\$strap`, which is a list-variable with bootstrap samples:

`bootstrap(efc, 1000)`

This is how the list-variable looks like:

```# A tibble: 1,000 x 1
strap
<list>
1  <data.frame [908 x 26]>
2  <data.frame [908 x 26]>
3  <data.frame [908 x 26]>
4  <data.frame [908 x 26]>
5  <data.frame [908 x 26]>
6  <data.frame [908 x 26]>
7  <data.frame [908 x 26]>
8  <data.frame [908 x 26]>
9  <data.frame [908 x 26]>
10 <data.frame [908 x 26]>
# ... with 990 more rows```

Since all data frames are saved in a list, you can use `lapply()` to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named models in the example below). Then, using `lapply()` again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated e42dep) for each „bootstrap“ model and save these coefficients in another variable (named dependency in the example below). Finally, we use the `boot_ci()`-function to calculate confidence intervals of the bootstrapped coefficients.

The complete code looks like this:

```efc %>%
# generate bootstrape replicates, saved in
# the list-variable 'strap'
bootstrap(1000) %>%
# run linear model on all bootstrap samples
mutate(models = lapply(.\$strap, function(x) {
lm(neg_c_7 ~ e42dep + c161sex, data = x)
})) %>%
# extract coefficient for "e42dep" (dependency) variable
mutate(dependency = unlist(lapply(.\$models, function(x) coef(x)[2]))) %>%
# compute boostrapped confidence intervals
boot_ci(dependency)```

And the result (depending on your `seed()`) is:

```conf.low conf.high
1.303847  1.790724```

The complete code, put together:

```library(dplyr)
library(sjstats)
data(efc)

fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
confint(fit)

efc %>%
bootstrap(1000) %>%
mutate(models = lapply(.\$strap, function(x) {
lm(neg_c_7 ~ e42dep + c161sex, data = x)
})) %>%
mutate(dependency = unlist(lapply(.\$models, function(x) coef(x)[2]))) %>%
boot_ci(dependency)```

Tagged: bootstrap, pipe, R, rstats

To leave a comment for the author, please follow the link and comment on their blog: R – Strenge Jacke!.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)