Changing names in the tidyverse: An example for many regressions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A collaborator posed an interesting R question to me today. She wanted to do
several regressions using different outcomes, with models being computed on
different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then
filter the strata based on p-value levels.
This seems straighforward, right? Let’s set up a toy example:
library(tidyverse) dat <- as_tibble(expand.grid(letters[1:4], 1:5)) d <- vector('list', nrow(dat)) set.seed(102) for(i in 1:nrow(dat)){ x <- rnorm(100) d[[i]] <- tibble(x = x, y1 = 3 - 2*x + rnorm(100), y2 = -4+5*x+rnorm(100)) } dat <- as_tibble(bind_cols(dat, tibble(dat=d))) %>% unnest() knitr::kable(head(dat), format='html')
Var1 | Var2 | x | y1 | y2 |
---|---|---|---|---|
a | 1 | 0.1805229 | 4.2598245 | -3.004535 |
a | 1 | 0.7847340 | 0.0023338 | -2.104949 |
a | 1 | -1.3531646 | 3.1711898 | -9.156758 |
a | 1 | 1.9832982 | -0.7140910 | 5.966377 |
a | 1 | 1.2384717 | 0.3523034 | 2.131004 |
a | 1 | 1.2006174 | 0.6267716 | 1.752106 |
Now we’re going to perform two regressions, one using y1
and one using y2
as the dependent variables, for each stratum defined by Var1
and Var2
.
out % nest(-Var1, -Var2) %>% mutate(model1 = map(data, ~lm(y1~x, data=.)), model2 = map(data, ~lm(y2~x, data=.)))
Now conceptually, all we do is tidy up the output for the models using the broom
package, filter on the rows containg the slope information, and extract the p-values, right? Not quite….
library(broom) out_problem % mutate(output1 = map(model1, ~tidy(.)), output2 = map(model2, ~tidy(.))) %>% select(-data, -model1, -model2) %>% unnest() names(out_problem)
[1] “Var1” “Var2” “term” “estimate” “std.error”
[6] “statistic” “p.value” “term” “estimate” “std.error”
[11] “statistic” “p.value”
We've got two sets of output, but with the same column names!!! This is a problem! An easy solution would be to preface the column names with the name of the response variable. I struggled with this today until I discovered the secret function.
out_nice % mutate(output1 = map(model1, ~tidy(.)), output2 = map(model2, ~tidy(.)), output1 = map(output1, ~setNames(., paste('y1', names(.), sep='_'))), output2 = map(output2, ~setNames(., paste('y2', names(.), sep='_')))) %>% select(-data, -model1, -model2) %>% unnest()
This is a compact representation of the results of both regressions by strata, and we can extract the information we would like very easily. For example, to extract the stratum-specific slope estimates:
out_nice %>% filter(y1_term=='x') %>% select(Var1, Var2, ends_with('estimate')) %>% knitr::kable(digits=3, format='html')
Var1 | Var2 | y1_estimate | y2_estimate |
---|---|---|---|
a | 1 | -1.897 | 5.036 |
b | 1 | -2.000 | 5.022 |
c | 1 | -1.988 | 4.888 |
d | 1 | -2.089 | 5.089 |
a | 2 | -2.052 | 5.015 |
b | 2 | -1.922 | 5.004 |
c | 2 | -1.936 | 4.969 |
d | 2 | -1.961 | 4.959 |
a | 3 | -2.043 | 5.017 |
b | 3 | -2.045 | 4.860 |
c | 3 | -1.996 | 5.009 |
d | 3 | -1.922 | 4.894 |
a | 4 | -2.000 | 4.942 |
b | 4 | -2.000 | 4.932 |
c | 4 | -2.033 | 5.042 |
d | 4 | -2.165 | 5.049 |
a | 5 | -2.094 | 5.010 |
b | 5 | -1.961 | 5.122 |
c | 5 | -2.106 | 5.153 |
d | 5 | -1.974 | 5.009 |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.