Changing names in the tidyverse: An example for many regressions

March 8, 2017

[This article was first published on R on Abhijit Dasgupta, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A collaborator posed an interesting R question to me today. She wanted to do several regressions using different outcomes, with models being computed on different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then filter the strata based on p-value levels.

This seems straighforward, right? Let’s set up a toy example:


dat <- as_tibble(expand.grid(letters[1:4], 1:5))
d <- vector('list', nrow(dat))
for(i in 1:nrow(dat)){
  x <- rnorm(100)
  d[[i]] <- tibble(x = x, y1 = 3 - 2*x + rnorm(100), y2 = -4+5*x+rnorm(100))
dat <- as_tibble(bind_cols(dat, tibble(dat=d))) %>% unnest()
knitr::kable(head(dat), digits=3, format='html')
Var1 Var2 x y1 y2
a 1 0.181 4.260 -3.005
a 1 0.785 0.002 -2.105
a 1 -1.353 3.171 -9.157
a 1 1.983 -0.714 5.966
a 1 1.238 0.352 2.131
a 1 1.201 0.627 1.752

Now we’re going to perform two regressions, one using y1 and one using y2 as the dependent variables, for each stratum defined by Var1 and Var2.

out <- dat %>%
  nest(-Var1, -Var2) %>%
  mutate(model1 = map(data, ~lm(y1~x, data=.)),
         model2 = map(data, ~lm(y2~x, data=.)))

Now conceptually, all we do is tidy up the output for the models using the broom package, filter on the rows containg the slope information, and extract the p-values, right? Not quite….

out_problem <- out %>% mutate(output1 = map(model1, ~tidy(.)),
                              output2 = map(model2, ~tidy(.))) %>%
  select(-data, -model1, -model2) %>%
##  [1] "Var1"       "Var2"       "term"       "estimate"   "std.error" 
##  [6] "statistic"  "p.value"    "term1"      "estimate1"  "std.error1"
## [11] "statistic1" "p.value1"

We’ve got two sets of output, but with the same column names!!! This is a problem! An easy solution would be to preface the column names with the name of the response variable. I struggled with this today until I discovered the secret function.

out_nice <- out %>% mutate(output1 = map(model1, ~tidy(.)),
                           output2 = map(model2, ~tidy(.)),
                           output1 = map(output1, ~setNames(., paste('y1', names(.), sep='_'))),
                           output2 = map(output2, ~setNames(., paste('y2', names(.), sep='_')))) %>%
  select(-data, -model1, -model2) %>%

This is a compact representation of the results of both regressions by strata, and we can extract the information we would like very easily. For example, to extract the stratum-specific slope estimates:

out_nice %>% filter(y1_term=='x') %>%
  select(Var1, Var2, ends_with('estimate')) %>%
  knitr::kable(digits=3, format='html')
Var1 Var2 y1_estimate y2_estimate
a 1 -1.897 5.036
b 1 -2.000 5.022
c 1 -1.988 4.888
d 1 -2.089 5.089
a 2 -2.052 5.015
b 2 -1.922 5.004
c 2 -1.936 4.969
d 2 -1.961 4.959
a 3 -2.043 5.017
b 3 -2.045 4.860
c 3 -1.996 5.009
d 3 -1.922 4.894
a 4 -2.000 4.942
b 4 -2.000 4.932
c 4 -2.033 5.042
d 4 -2.165 5.049
a 5 -2.094 5.010
b 5 -1.961 5.122
c 5 -2.106 5.153
d 5 -1.974 5.009

To leave a comment for the author, please follow the link and comment on their blog: R on Abhijit Dasgupta. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)