# (Bootstrapping) Follow-Up Contrasts for Within-Subject ANOVAs

**R on I Should Be Writing: Now Sugar Free!**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So you’ve run a repeated-measures ANOVA and found that your residuals are neither normally distributed, nor homogeneous, or that you are in violation of any other assumptions. Naturally you want to run some a-parametric analysis… but how?

In this post I will demonstrate how to run a permutation test ANOVA (easy!) and how to run bootstrap follow-up analysis (a bit more challenging) in a mixed design (both within- and between-subject factors) ANOVA. I’ve chosen to use a mixed design model in this demonstration for two reasons:

- I’ve never seen this done before.
- You can easily modify this code (change / skip some of these steps) to accommodate purely within- or purely between-subject designs.

## Permutation ANOVA

Running a permutation test for your ANOVA in R is as easy as… running an ANOVA in R, but substituting `aov`

with `aovperm`

from the `permuco`

package.

library(permuco) data(obk.long, package = "afex") # data from the afex package # permutation anova fit_mixed_p <- aovperm(value ~ treatment * gender * phase * hour + Error(id / (phase * hour)), data = obk.long) ## Warning in checkBalancedData(fixed_formula = formula_f, data = cbind(y, : ## The data are not balanced, the results may not be exact. fit_mixed_p

term | SSn | dfn | SSd | dfd | MSEn | MSEd | F | parametric P(>F) | permutation P(>F) |
---|---|---|---|---|---|---|---|---|---|

treatment | 179.73 | 2 | 228.06 | 10 | 89.87 | 22.81 | 3.94 | 0.055 | 0.055 |

gender | 83.45 | 1 | 228.06 | 10 | 83.45 | 22.81 | 3.66 | 0.085 | 0.082 |

treatment:gender | 130.24 | 2 | 228.06 | 10 | 65.12 | 22.81 | 2.86 | 0.104 | 0.104 |

phase | 129.51 | 2 | 80.28 | 20 | 64.76 | 4.01 | 16.13 | <0.001 | <0.001 |

treatment:phase | 77.89 | 4 | 80.28 | 20 | 19.47 | 4.01 | 4.85 | 0.007 | 0.009 |

gender:phase | 2.27 | 2 | 80.28 | 20 | 1.14 | 4.01 | 0.28 | 0.757 | 0.765 |

treatment:gender:phase | 10.22 | 4 | 80.28 | 20 | 2.56 | 4.01 | 0.64 | 0.642 | 0.641 |

hour | 104.29 | 4 | 62.50 | 40 | 26.07 | 1.56 | 16.69 | <0.001 | <0.001 |

treatment:hour | 1.17 | 8 | 62.50 | 40 | 0.15 | 1.56 | 0.09 | 0.999 | 1.000 |

gender:hour | 2.81 | 4 | 62.50 | 40 | 0.70 | 1.56 | 0.45 | 0.772 | 0.772 |

treatment:gender:hour | 7.76 | 8 | 62.50 | 40 | 0.97 | 1.56 | 0.62 | 0.755 | 0.755 |

phase:hour | 11.35 | 8 | 96.17 | 80 | 1.42 | 1.20 | 1.18 | 0.322 | 0.319 |

treatment:phase:hour | 6.64 | 16 | 96.17 | 80 | 0.42 | 1.20 | 0.35 | 0.990 | 0.990 |

gender:phase:hour | 8.96 | 8 | 96.17 | 80 | 1.12 | 1.20 | 0.93 | 0.496 | 0.498 |

treatment:gender:phase:hour | 14.15 | 16 | 96.17 | 80 | 0.88 | 1.20 | 0.74 | 0.750 | 0.753 |

The results of the permutation test suggest an interaction between Treatment (a between subject factor) and Phase (a within-subject factor). To fully understand this interaction, we would like to conduct some sort of follow-up analysis (planned comparisons or post hoc tests). Usually I would pass the model to `emmeans`

for any follow-ups, but here, due to our assumption violations, we need some sort of bootstrapping method.

## Bootstrapping with `car`

and `emmeans`

For bootstrapping we will be using the `Boot`

function from the `car`

package. In the case of within-subject factors, this function requires that they be specified in a multivariate data structure. Let’s see how this is done.

### 1. Make your data WIIIIIIIIIIDEEEEEEEE

library(dplyr) library(tidyr) obk_mixed_wide <- obk.long %>% unite("cond", phase, hour) %>% spread(cond, value) head(obk_mixed_wide) ## id treatment gender age fup_1 fup_2 fup_3 fup_4 fup_5 post_1 post_2 ## 1 1 control M -4.75 2 3 2 4 4 3 2 ## 2 2 control M -2.75 4 5 6 4 1 2 2 ## 3 3 control M 1.25 7 6 9 7 6 4 5 ## 4 4 control F 7.25 4 4 5 3 4 2 2 ## 5 5 control F -5.75 4 3 6 4 3 6 7 ## 6 6 A M 7.25 9 10 11 9 6 9 9 ## post_3 post_4 post_5 pre_1 pre_2 pre_3 pre_4 pre_5 ## 1 5 3 2 1 2 4 2 1 ## 2 3 5 3 4 4 5 3 4 ## 3 7 5 4 5 6 5 7 7 ## 4 3 5 3 5 4 7 5 4 ## 5 8 6 3 3 4 6 4 3 ## 6 10 8 9 7 8 7 9 9

This is not enough, as we *also* need our new columns (representing the different levels of the within subject factors) to be in a matrix column.

obk_mixed_matrixDV <- obk_mixed_wide %>% select(id, age, treatment, gender) obk_mixed_matrixDV$M <- obk_mixed_wide %>% select(-id, -age, -treatment, -gender) %>% as.matrix() glimpse(obk_mixed_matrixDV) ## Observations: 16 ## Variables: 5 ## $ id1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ## $ age -4.75, -2.75, 1.25, 7.25, -5.75, 7.25, 8.25, 2.25, 2... ## $ treatment control, control, control, control, control, A, A, A... ## $ gender M, M, M, F, F, M, M, F, F, M, M, M, F, F, F, F ## $ M

### 2. Fit your *regular* model

fit_mixed <- aov(M ~ treatment * gender, obk_mixed_matrixDV)

Note that the left-hand-side of the formula (the `M`

) is a matrix representing all the within-subject factors and their levels, and the right-hand-side of the formula (`treatment * gender`

) includes only the between-subject factors.

### 3. Define the contrast(s) of interest

For this step we will be using `emmeans`

. But first, we need to create a list of the within-subject factors and their levels (I did say this was difficult - bear with me!). This list needs to correspond to the order of the multi-variate column in our data, such that if there is more than one factor, the combinations of factor levels are used in expand.grid order. In our case:

colnames(obk_mixed_matrixDV$M) ## [1] "fup_1" "fup_2" "fup_3" "fup_4" "fup_5" "post_1" "post_2" ## [8] "post_3" "post_4" "post_5" "pre_1" "pre_2" "pre_3" "pre_4" ## [15] "pre_5" rm_levels <- list(hour = c("1", "2", "3", "4", "5"), phase = c("fup", "post", "pre"))

Make sure you get the order of the variables and their levels correct! This will affect your results!

Let’s use `emmeans`

to get the estimates of the pairwise differences between the treatment groups within each phase of the study:

library(emmeans) # get the correct reference grid with the correct ultivariate levels! rg <- ref_grid(fit_mixed, mult.levs = rm_levels) rg ## 'emmGrid' object with variables: ## treatment = control, A, B ## gender = F, M ## hour = multivariate response levels: 1, 2, 3, 4, 5 ## phase = multivariate response levels: fup, post, pre # get the expected means: em_ <- emmeans(rg, ~ phase * treatment) em_ ## phase treatment emmean SE df lower.CL upper.CL ## fup control 4.33 0.551 10 3.11 5.56 ## post control 4.08 0.628 10 2.68 5.48 ## pre control 4.25 0.766 10 2.54 5.96 ## fup A 7.25 0.604 10 5.90 8.60 ## post A 6.50 0.688 10 4.97 8.03 ## pre A 5.00 0.839 10 3.13 6.87 ## fup B 7.29 0.461 10 6.26 8.32 ## post B 6.62 0.525 10 5.45 7.80 ## pre B 4.17 0.641 10 2.74 5.59 ## ## Results are averaged over the levels of: gender, hour ## Confidence level used: 0.95 # run pairwise tests between the treatment groups within each phase c_ <- contrast(em_, "pairwise", by = 'phase') c_ ## phase = fup: ## contrast estimate SE df t.ratio p.value ## control - A -2.9167 0.818 10 -3.568 0.0129 ## control - B -2.9583 0.719 10 -4.116 0.0054 ## A - B -0.0417 0.760 10 -0.055 0.9983 ## ## phase = post: ## contrast estimate SE df t.ratio p.value ## control - A -2.4167 0.931 10 -2.595 0.0634 ## control - B -2.5417 0.819 10 -3.105 0.0275 ## A - B -0.1250 0.865 10 -0.144 0.9886 ## ## phase = pre: ## contrast estimate SE df t.ratio p.value ## control - A -0.7500 1.136 10 -0.660 0.7911 ## control - B 0.0833 0.999 10 0.083 0.9962 ## A - B 0.8333 1.056 10 0.789 0.7177 ## ## Results are averaged over the levels of: gender, hour ## P value adjustment: tukey method for comparing a family of 3 estimates # extract the estimates est_names <- c("fup: control - A", "fup: control - B", "fup: A - B", "post: control - A", "post: control - B", "post: A - B", "pre: control - A", "pre: control - B", "pre: A - B") est_values <- summary(c_)$estimate names(est_values) <- est_names est_values ## fup: control - A fup: control - B fup: A - B post: control - A ## -2.91666667 -2.95833333 -0.04166667 -2.41666667 ## post: control - B post: A - B pre: control - A pre: control - B ## -2.54166667 -0.12500000 -0.75000000 0.08333333 ## pre: A - B ## 0.83333333

### 4. Run the bootstrap

Now let’s wrap this all in a function that accepts the fitted model as an argument:

treatment_phase_contrasts <- function(mod){ rg <- ref_grid(mod, mult.levs = rm_levels) # get the expected means: em_ <- emmeans(rg, ~ phase * treatment) # run pairwise tests between the treatment groups within each phase c_ <- contrast(em_, "pairwise", by = 'phase') # extract the estimates est_names <- c("fup: control - A", "fup: control - B", "fup: A - B", "post: control - A", "post: control - B", "post: A - B", "pre: control - A", "pre: control - B", "pre: A - B") est_values <- summary(c_)$estimate names(est_values) <- est_names est_values } # test it treatment_phase_contrasts(fit_mixed) ## fup: control - A fup: control - B fup: A - B post: control - A ## -2.91666667 -2.95833333 -0.04166667 -2.41666667 ## post: control - B post: A - B pre: control - A pre: control - B ## -2.54166667 -0.12500000 -0.75000000 0.08333333 ## pre: A - B ## 0.83333333

Finally, we will use `car::Boot`

to get the bootstrapped estimates!

library(car) treatment_phase_results <- Boot(fit_mixed, treatment_phase_contrasts, R = 50) # R = 599 at least ## Loading required namespace: boot summary(treatment_phase_results) # original vs. bootstrapped estimate (bootMed) ## ## Number of bootstrap replications R = 31 ## original bootBias bootSE bootMed ## fup: control - A -2.916667 0.044892 0.65137 -2.8333e+00 ## fup: control - B -2.958333 -0.026805 0.82950 -3.0000e+00 ## fup: A - B -0.041667 -0.071697 0.40960 -1.6667e-01 ## post: control - A -2.416667 -0.011444 0.74882 -2.5000e+00 ## post: control - B -2.541667 0.048310 0.94075 -2.4167e+00 ## post: A - B -0.125000 0.059754 0.64484 4.3374e-15 ## pre: control - A -0.750000 -0.129339 0.63190 -7.0000e-01 ## pre: control - B 0.083333 -0.099923 1.01857 9.1667e-02 ## pre: A - B 0.833333 0.029416 0.89102 8.3333e-01 confint(treatment_phase_results, type = "perc") # does include zero? ## Bootstrap percent confidence intervals ## ## 2.5 % 97.5 % ## fup: control - A -4.0000000 -1.8000000 ## fup: control - B -4.3571429 -1.4000000 ## fup: A - B -0.8571429 0.7083333 ## post: control - A -4.0000000 -1.3000000 ## post: control - B -4.0000000 -0.7500000 ## post: A - B -1.3809524 1.0000000 ## pre: control - A -2.0000000 0.7500000 ## pre: control - B -2.2500000 2.0416667 ## pre: A - B -0.9666667 2.2083333

Results indicate that the Control group is lower than both treatment groups in the post and fup (follow -up) phases.

If we wanted p-values, we could use this little function (based on this demo):

boot_pvalues <- function(x, side = c(0, -1, 1)) { # Based on: # https://blogs.sas.com/content/iml/2011/11/02/how-to-compute-p-values-for-a-bootstrap-distribution.html side <- side[1] x <- as.data.frame(x$t) ps <- sapply(x, function(.x) { s <- na.omit(.x) s0 <- 0 N <- length(s) if (side == 0) { min((1 + sum(s >= s0)) / (N + 1), (1 + sum(s <= s0)) / (N + 1)) * 2 } else if (side < 0) { (1 + sum(s <= s0)) / (N + 1) } else if (side > 0) { (1 + sum(s >= s0)) / (N + 1) } }) setNames(ps,colnames(x)) } boot_pvalues(treatment_phase_results) ## fup: control - A fup: control - B fup: A - B post: control - A ## 0.0625 0.0625 0.6875 0.0625 ## post: control - B post: A - B pre: control - A pre: control - B ## 0.0625 0.9375 0.1250 0.9375 ## pre: A - B ## 0.3750

These p-values can then be passed to `p.adjust()`

for the p-value adjustment method of your choosing.

## Summary

I’ve demonstrated how to run permutation tests on main effects / interactions, with follow-up analysis using the bootstrap method. Using this code as a basis for any analysis you might have in mind gives you all the flexibility of `emmeans`

, which supports many (many) models!

**leave a comment**for the author, please follow the link and comment on their blog:

**R on I Should Be Writing: Now Sugar Free!**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.