**R on I Should Be Writing: Now Sugar Free!**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So you’ve run a repeated-measures ANOVA and found that your residuals are neither normally distributed, nor homogeneous, or that you are in violation of any other assumptions. Naturally you want to run some a-parametric analysis… but how?

In this post I will demonstrate how to run a permutation test ANOVA (easy!) and how to run bootstrap follow-up analysis (a bit more challenging) in a mixed design (both within- and between-subject factors) ANOVA. I’ve chosen to use a mixed design model in this demonstration for two reasons:

- I’ve never seen this done before.
- You can easily modify this code (change / skip some of these steps) to accommodate purely within- or purely between-subject designs.

## Permutation ANOVA

Running a permutation test for your ANOVA in R is as easy as… running an ANOVA in R, but substituting `aov`

with `aovperm`

from the `permuco`

package.

```
library(permuco)
data(obk.long, package = "afex") # data from the afex package
# permutation anova
fit_mixed_p <-
aovperm(value ~ treatment * gender * phase * hour + Error(id / (phase * hour)),
data = obk.long)
```

```
## Warning in checkBalancedData(fixed_formula = formula_f, data = cbind(y, :
## The data are not balanced, the results may not be exact.
```

`fit_mixed_p`

term | SSn | dfn | SSd | dfd | MSEn | MSEd | F | parametric P(>F) | permutation P(>F) |
---|---|---|---|---|---|---|---|---|---|

treatment | 179.73 | 2 | 228.06 | 10 | 89.87 | 22.81 | 3.94 | 0.055 | 0.055 |

gender | 83.45 | 1 | 228.06 | 10 | 83.45 | 22.81 | 3.66 | 0.085 | 0.082 |

treatment:gender | 130.24 | 2 | 228.06 | 10 | 65.12 | 22.81 | 2.86 | 0.104 | 0.104 |

phase | 129.51 | 2 | 80.28 | 20 | 64.76 | 4.01 | 16.13 | <0.001 | <0.001 |

treatment:phase | 77.89 | 4 | 80.28 | 20 | 19.47 | 4.01 | 4.85 | 0.007 | 0.009 |

gender:phase | 2.27 | 2 | 80.28 | 20 | 1.14 | 4.01 | 0.28 | 0.757 | 0.765 |

treatment:gender:phase | 10.22 | 4 | 80.28 | 20 | 2.56 | 4.01 | 0.64 | 0.642 | 0.641 |

hour | 104.29 | 4 | 62.50 | 40 | 26.07 | 1.56 | 16.69 | <0.001 | <0.001 |

treatment:hour | 1.17 | 8 | 62.50 | 40 | 0.15 | 1.56 | 0.09 | 0.999 | 1.000 |

gender:hour | 2.81 | 4 | 62.50 | 40 | 0.70 | 1.56 | 0.45 | 0.772 | 0.772 |

treatment:gender:hour | 7.76 | 8 | 62.50 | 40 | 0.97 | 1.56 | 0.62 | 0.755 | 0.755 |

phase:hour | 11.35 | 8 | 96.17 | 80 | 1.42 | 1.20 | 1.18 | 0.322 | 0.319 |

treatment:phase:hour | 6.64 | 16 | 96.17 | 80 | 0.42 | 1.20 | 0.35 | 0.990 | 0.990 |

gender:phase:hour | 8.96 | 8 | 96.17 | 80 | 1.12 | 1.20 | 0.93 | 0.496 | 0.498 |

treatment:gender:phase:hour | 14.15 | 16 | 96.17 | 80 | 0.88 | 1.20 | 0.74 | 0.750 | 0.753 |

The results of the permutation test suggest an interaction between Treatment (a between subject factor) and Phase (a within-subject factor). To fully understand this interaction, we would like to conduct some sort of follow-up analysis (planned comparisons or post hoc tests). Usually I would pass the model to `emmeans`

for any follow-ups, but here, due to our assumption violations, we need some sort of bootstrapping method.

## Bootstrapping with `car`

and `emmeans`

For bootstrapping we will be using the `Boot`

function from the `car`

package. In the case of within-subject factors, this function requires that they be specified in a multivariate data structure. Let’s see how this is done.

### 1. Make your data WIIIIIIIIIIDEEEEEEEE

```
library(dplyr)
library(tidyr)
obk_mixed_wide <- obk.long %>%
unite("cond", phase, hour) %>%
spread(cond, value)
head(obk_mixed_wide)
```

```
## id treatment gender age fup_1 fup_2 fup_3 fup_4 fup_5 post_1 post_2
## 1 1 control M -4.75 2 3 2 4 4 3 2
## 2 2 control M -2.75 4 5 6 4 1 2 2
## 3 3 control M 1.25 7 6 9 7 6 4 5
## 4 4 control F 7.25 4 4 5 3 4 2 2
## 5 5 control F -5.75 4 3 6 4 3 6 7
## 6 6 A M 7.25 9 10 11 9 6 9 9
## post_3 post_4 post_5 pre_1 pre_2 pre_3 pre_4 pre_5
## 1 5 3 2 1 2 4 2 1
## 2 3 5 3 4 4 5 3 4
## 3 7 5 4 5 6 5 7 7
## 4 3 5 3 5 4 7 5 4
## 5 8 6 3 3 4 6 4 3
## 6 10 8 9 7 8 7 9 9
```

This is not enough, as we *also* need our new columns (representing the different levels of the within subject factors) to be in a matrix column.

```
obk_mixed_matrixDV <- obk_mixed_wide %>%
select(id, age, treatment, gender)
obk_mixed_matrixDV$M <- obk_mixed_wide %>%
select(-id, -age, -treatment, -gender) %>%
as.matrix()
glimpse(obk_mixed_matrixDV)
```

```
## Observations: 16
## Variables: 5
## $ id
``` 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ age -4.75, -2.75, 1.25, 7.25, -5.75, 7.25, 8.25, 2.25, 2...
## $ treatment control, control, control, control, control, A, A, A...
## $ gender M, M, M, F, F, M, M, F, F, M, M, M, F, F, F, F
## $ M

### 2. Fit your *regular* model

`fit_mixed <- aov(M ~ treatment * gender, obk_mixed_matrixDV)`

Note that the left-hand-side of the formula (the `M`

) is a matrix representing all the within-subject factors and their levels, and the right-hand-side of the formula (`treatment * gender`

) includes only the between-subject factors.

### 3. Define the contrast(s) of interest

For this step we will be using `emmeans`

. But first, we need to create a list of the within-subject factors and their levels (I did say this was difficult – bear with me!). This list needs to correspond to the order of the multi-variate column in our data, such that if there is more than one factor, the combinations of factor levels are used in expand.grid order. In our case:

`colnames(obk_mixed_matrixDV$M)`

```
## [1] "fup_1" "fup_2" "fup_3" "fup_4" "fup_5" "post_1" "post_2"
## [8] "post_3" "post_4" "post_5" "pre_1" "pre_2" "pre_3" "pre_4"
## [15] "pre_5"
```

```
rm_levels <- list(hour = c("1", "2", "3", "4", "5"),
phase = c("fup", "post", "pre"))
```

Make sure you get the order of the variables and their levels correct! This will affect your results!

Let’s use `emmeans`

to get the estimates of the pairwise differences between the treatment groups within each phase of the study:

```
library(emmeans)
# get the correct reference grid with the correct ultivariate levels!
rg <- ref_grid(fit_mixed, mult.levs = rm_levels)
rg
```

```
## 'emmGrid' object with variables:
## treatment = control, A, B
## gender = F, M
## hour = multivariate response levels: 1, 2, 3, 4, 5
## phase = multivariate response levels: fup, post, pre
```

```
# get the expected means:
em_ <- emmeans(rg, ~ phase * treatment)
em_
```

```
## phase treatment emmean SE df lower.CL upper.CL
## fup control 4.33 0.551 10 3.11 5.56
## post control 4.08 0.628 10 2.68 5.48
## pre control 4.25 0.766 10 2.54 5.96
## fup A 7.25 0.604 10 5.90 8.60
## post A 6.50 0.688 10 4.97 8.03
## pre A 5.00 0.839 10 3.13 6.87
## fup B 7.29 0.461 10 6.26 8.32
## post B 6.62 0.525 10 5.45 7.80
## pre B 4.17 0.641 10 2.74 5.59
##
## Results are averaged over the levels of: gender, hour
## Confidence level used: 0.95
```

```
# run pairwise tests between the treatment groups within each phase
c_ <- contrast(em_, "pairwise", by = 'phase')
c_
```

```
## phase = fup:
## contrast estimate SE df t.ratio p.value
## control - A -2.9167 0.818 10 -3.568 0.0129
## control - B -2.9583 0.719 10 -4.116 0.0054
## A - B -0.0417 0.760 10 -0.055 0.9983
##
## phase = post:
## contrast estimate SE df t.ratio p.value
## control - A -2.4167 0.931 10 -2.595 0.0634
## control - B -2.5417 0.819 10 -3.105 0.0275
## A - B -0.1250 0.865 10 -0.144 0.9886
##
## phase = pre:
## contrast estimate SE df t.ratio p.value
## control - A -0.7500 1.136 10 -0.660 0.7911
## control - B 0.0833 0.999 10 0.083 0.9962
## A - B 0.8333 1.056 10 0.789 0.7177
##
## Results are averaged over the levels of: gender, hour
## P value adjustment: tukey method for comparing a family of 3 estimates
```

```
# extract the estimates
est_names <- c("fup: control - A", "fup: control - B", "fup: A - B",
"post: control - A", "post: control - B", "post: A - B",
"pre: control - A", "pre: control - B", "pre: A - B")
est_values <- summary(c_)$estimate
names(est_values) <- est_names
est_values
```

```
## fup: control - A fup: control - B fup: A - B post: control - A
## -2.91666667 -2.95833333 -0.04166667 -2.41666667
## post: control - B post: A - B pre: control - A pre: control - B
## -2.54166667 -0.12500000 -0.75000000 0.08333333
## pre: A - B
## 0.83333333
```

### 4. Run the bootstrap

Now let’s wrap this all in a function that accepts the fitted model as an argument:

```
treatment_phase_contrasts <- function(mod){
rg <- ref_grid(mod, mult.levs = rm_levels)
# get the expected means:
em_ <- emmeans(rg, ~ phase * treatment)
# run pairwise tests between the treatment groups within each phase
c_ <- contrast(em_, "pairwise", by = 'phase')
# extract the estimates
est_names <- c("fup: control - A", "fup: control - B", "fup: A - B",
"post: control - A", "post: control - B", "post: A - B",
"pre: control - A", "pre: control - B", "pre: A - B")
est_values <- summary(c_)$estimate
names(est_values) <- est_names
est_values
}
# test it
treatment_phase_contrasts(fit_mixed)
```

```
## fup: control - A fup: control - B fup: A - B post: control - A
## -2.91666667 -2.95833333 -0.04166667 -2.41666667
## post: control - B post: A - B pre: control - A pre: control - B
## -2.54166667 -0.12500000 -0.75000000 0.08333333
## pre: A - B
## 0.83333333
```

Finally, we will use `car::Boot`

to get the bootstrapped estimates!

```
library(car)
treatment_phase_results <-
Boot(fit_mixed, treatment_phase_contrasts, R = 50) # R = 599 at least
```

`## Loading required namespace: boot`

`summary(treatment_phase_results) # original vs. bootstrapped estimate (bootMed)`

```
##
## Number of bootstrap replications R = 31
## original bootBias bootSE bootMed
## fup: control - A -2.916667 0.044892 0.65137 -2.8333e+00
## fup: control - B -2.958333 -0.026805 0.82950 -3.0000e+00
## fup: A - B -0.041667 -0.071697 0.40960 -1.6667e-01
## post: control - A -2.416667 -0.011444 0.74882 -2.5000e+00
## post: control - B -2.541667 0.048310 0.94075 -2.4167e+00
## post: A - B -0.125000 0.059754 0.64484 4.3374e-15
## pre: control - A -0.750000 -0.129339 0.63190 -7.0000e-01
## pre: control - B 0.083333 -0.099923 1.01857 9.1667e-02
## pre: A - B 0.833333 0.029416 0.89102 8.3333e-01
```

`confint(treatment_phase_results, type = "perc") # does include zero?`

```
## Bootstrap percent confidence intervals
##
## 2.5 % 97.5 %
## fup: control - A -4.0000000 -1.8000000
## fup: control - B -4.3571429 -1.4000000
## fup: A - B -0.8571429 0.7083333
## post: control - A -4.0000000 -1.3000000
## post: control - B -4.0000000 -0.7500000
## post: A - B -1.3809524 1.0000000
## pre: control - A -2.0000000 0.7500000
## pre: control - B -2.2500000 2.0416667
## pre: A - B -0.9666667 2.2083333
```

Results indicate that the Control group is lower than both treatment groups in the post and fup (follow -up) phases.

If we wanted p-values, we could use this little function (based on this demo):

```
boot_pvalues <- function(x, side = c(0, -1, 1)) {
# Based on:
# https://blogs.sas.com/content/iml/2011/11/02/how-to-compute-p-values-for-a-bootstrap-distribution.html
side <- side[1]
x <- as.data.frame(x$t)
ps <- sapply(x, function(.x) {
s <- na.omit(.x)
s0 <- 0
N <- length(s)
if (side == 0) {
min((1 + sum(s >= s0)) / (N + 1),
(1 + sum(s <= s0)) / (N + 1)) * 2
} else if (side < 0) {
(1 + sum(s <= s0)) / (N + 1)
} else if (side > 0) {
(1 + sum(s >= s0)) / (N + 1)
}
})
setNames(ps,colnames(x))
}
boot_pvalues(treatment_phase_results)
```

```
## fup: control - A fup: control - B fup: A - B post: control - A
## 0.0625 0.0625 0.6875 0.0625
## post: control - B post: A - B pre: control - A pre: control - B
## 0.0625 0.9375 0.1250 0.9375
## pre: A - B
## 0.3750
```

These p-values can then be passed to `p.adjust()`

for the p-value adjustment method of your choosing.

## Summary

I’ve demonstrated how to run permutation tests on main effects / interactions, with follow-up analysis using the bootstrap method. Using this code as a basis for any analysis you might have in mind gives you all the flexibility of `emmeans`

, which supports many (many) models!

**leave a comment**for the author, please follow the link and comment on their blog:

**R on I Should Be Writing: Now Sugar Free!**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.