Pipe-friendly bootstrapping with list-variables in #rstats

August 1, 2016
By

(This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers)

A few days ago, my package sjstats was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely bootstrap(), boot_ci(), boot_se() and boot_p(). In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.

First, load the required libraries:

library(dplyr)
library(sjstats)

Now, init the sample data and fit a regular model. The model estimates how the dependency (e42dep) of an older person is related to the burden of care (neg_c_7) of a person who provides care to the frail older people:

data(efc)
fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)

I demonstrate the boot_ci()-function, so the confidence intervals are of interest here:

confint(fit)

>                  2.5 %    97.5 %
> (Intercept)  5.3374378 7.7794162
> e42dep       1.2929451 1.7964296
> c161sex     -0.1193198 0.9871336

Now let’s see to obtain bootstrapped confidence intervals for this model. First, the bootstrap()-function generates bootstrap replicates and returns a data-frame with just one column, $strap, which is a list-variable with bootstrap samples:

bootstrap(efc, 1000)

This is how the list-variable looks like:

# A tibble: 1,000 x 1
                     strap
                    
1  
2  
3  
4  
5  
6  
7  
8  
9  
10 
# ... with 990 more rows

Since all data frames are saved in a list, you can use lapply() to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named models in the example below). Then, using lapply() again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated e42dep) for each „bootstrap“ model and save these coefficients in another variable (named dependency in the example below). Finally, we use the boot_ci()-function to calculate confidence intervals of the bootstrapped coefficients.

The complete code looks like this:

efc %>% 
  # generate bootstrape replicates, saved in
  # the list-variable 'strap'
  bootstrap(1000) %>% 
  # run linear model on all bootstrap samples
  mutate(models = lapply(.$strap, function(x) {
    lm(neg_c_7 ~ e42dep + c161sex, data = x)
  })) %>%
  # extract coefficient for "e42dep" (dependency) variable
  mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>%
  # compute boostrapped confidence intervals
  boot_ci(dependency)

And the result (depending on your seed()) is:

conf.low conf.high 
1.303847  1.790724

The complete code, put together:

library(dplyr)
library(sjstats)
data(efc)

fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
confint(fit)

efc %>% 
  bootstrap(1000) %>% 
  mutate(models = lapply(.$strap, function(x) {
    lm(neg_c_7 ~ e42dep + c161sex, data = x)
  })) %>%
  mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>%
  boot_ci(dependency)

Tagged: bootstrap, pipe, R, rstats

To leave a comment for the author, please follow the link and comment on their blog: R – Strenge Jacke!.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)