**R – Stat Bandit**, and kindly contributed to R-bloggers)

A collaborator posed an interesting R question to me today. She wanted to do

several regressions using different outcomes, with models being computed on

different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then

filter the strata based on p-value levels.

This seems straighforward, right? Let’s set up a toy example:

library(tidyverse) dat <- as_tibble(expand.grid(letters[1:4], 1:5)) d <- vector('list', nrow(dat)) set.seed(102) for(i in 1:nrow(dat)){ x <- rnorm(100) d[[i]] <- tibble(x = x, y1 = 3 - 2*x + rnorm(100), y2 = -4+5*x+rnorm(100)) } dat <- as_tibble(bind_cols(dat, tibble(dat=d))) %>% unnest() knitr::kable(head(dat), format='html')

Var1 | Var2 | x | y1 | y2 |
---|---|---|---|---|

a | 1 | 0.1805229 | 4.2598245 | -3.004535 |

a | 1 | 0.7847340 | 0.0023338 | -2.104949 |

a | 1 | -1.3531646 | 3.1711898 | -9.156758 |

a | 1 | 1.9832982 | -0.7140910 | 5.966377 |

a | 1 | 1.2384717 | 0.3523034 | 2.131004 |

a | 1 | 1.2006174 | 0.6267716 | 1.752106 |

Now we’re going to perform two regressions, one using `y1`

and one using `y2`

as the dependent variables, for each stratum defined by `Var1`

and `Var2`

.

out % nest(-Var1, -Var2) %>% mutate(model1 = map(data, ~lm(y1~x, data=.)), model2 = map(data, ~lm(y2~x, data=.)))

Now conceptually, all we do is tidy up the output for the models using the `broom`

package, filter on the rows containg the slope information, and extract the p-values, right? Not quite….

library(broom) out_problem % mutate(output1 = map(model1, ~tidy(.)), output2 = map(model2, ~tidy(.))) %>% select(-data, -model1, -model2) %>% unnest() names(out_problem)

[1] "Var1" "Var2" "term" "estimate" "std.error"

[6] "statistic" "p.value" "term" "estimate" "std.error"

[11] "statistic" "p.value"

We've got two sets of output, but with the same column names!!! This is a problem! An easy solution would be to preface the column names with the name of the response variable. I struggled with this today until I discovered the *secret function*.

out_nice % mutate(output1 = map(model1, ~tidy(.)), output2 = map(model2, ~tidy(.)), output1 = map(output1, ~setNames(., paste('y1', names(.), sep='_'))), output2 = map(output2, ~setNames(., paste('y2', names(.), sep='_')))) %>% select(-data, -model1, -model2) %>% unnest()

This is a compact representation of the results of both regressions by strata, and we can extract the information we would like very easily. For example, to extract the stratum-specific slope estimates:

out_nice %>% filter(y1_term=='x') %>% select(Var1, Var2, ends_with('estimate')) %>% knitr::kable(digits=3, format='html')

Var1 | Var2 | y1_estimate | y2_estimate |
---|---|---|---|

a | 1 | -1.897 | 5.036 |

b | 1 | -2.000 | 5.022 |

c | 1 | -1.988 | 4.888 |

d | 1 | -2.089 | 5.089 |

a | 2 | -2.052 | 5.015 |

b | 2 | -1.922 | 5.004 |

c | 2 | -1.936 | 4.969 |

d | 2 | -1.961 | 4.959 |

a | 3 | -2.043 | 5.017 |

b | 3 | -2.045 | 4.860 |

c | 3 | -1.996 | 5.009 |

d | 3 | -1.922 | 4.894 |

a | 4 | -2.000 | 4.942 |

b | 4 | -2.000 | 4.932 |

c | 4 | -2.033 | 5.042 |

d | 4 | -2.165 | 5.049 |

a | 5 | -2.094 | 5.010 |

b | 5 | -1.961 | 5.122 |

c | 5 | -2.106 | 5.153 |

d | 5 | -1.974 | 5.009 |

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Stat Bandit**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...