Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few days ago, my package sjstats was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely `bootstrap()`, `boot_ci()`, `boot_se()` and `boot_p()`. In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.

```library(dplyr)
library(sjstats)```

Now, init the sample data and fit a regular model. The model estimates how the dependency (e42dep) of an older person is related to the burden of care (neg_c_7) of a person who provides care to the frail older people:

```data(efc)
fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)```

I demonstrate the `boot_ci()`-function, so the confidence intervals are of interest here:

```confint(fit)

>                  2.5 %    97.5 %
> (Intercept)  5.3374378 7.7794162
> e42dep       1.2929451 1.7964296
> c161sex     -0.1193198 0.9871336```

Now let’s see to obtain bootstrapped confidence intervals for this model. First, the `bootstrap()`-function generates bootstrap replicates and returns a data-frame with just one column, `\$strap`, which is a list-variable with bootstrap samples:

`bootstrap(efc, 1000)`

This is how the list-variable looks like:

```# A tibble: 1,000 x 1
strap

1
2
3
4
5
6
7
8
9
10
# ... with 990 more rows```

Since all data frames are saved in a list, you can use `lapply()` to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named models in the example below). Then, using `lapply()` again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated e42dep) for each „bootstrap“ model and save these coefficients in another variable (named dependency in the example below). Finally, we use the `boot_ci()`-function to calculate confidence intervals of the bootstrapped coefficients.

The complete code looks like this:

```efc %>%
# generate bootstrape replicates, saved in
# the list-variable 'strap'
bootstrap(1000) %>%
# run linear model on all bootstrap samples
mutate(models = lapply(.\$strap, function(x) {
lm(neg_c_7 ~ e42dep + c161sex, data = x)
})) %>%
# extract coefficient for "e42dep" (dependency) variable
mutate(dependency = unlist(lapply(.\$models, function(x) coef(x)))) %>%
# compute boostrapped confidence intervals
boot_ci(dependency)```

And the result (depending on your `seed()`) is:

```conf.low conf.high
1.303847  1.790724```

The complete code, put together:

```library(dplyr)
library(sjstats)
data(efc)

fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
confint(fit)

efc %>%
bootstrap(1000) %>%
mutate(models = lapply(.\$strap, function(x) {
lm(neg_c_7 ~ e42dep + c161sex, data = x)
})) %>%
mutate(dependency = unlist(lapply(.\$models, function(x) coef(x)))) %>%
boot_ci(dependency)```

Tagged: bootstrap, pipe, R, rstats  