ANOVA vs Multiple Comparisons

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When we run an ANOVA, we analyze the differences among group means in a sample. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.

ANOVA Null and Alternatve Hypothesis

The null hypothesis in ANOVA is that there is no difference between means and the alternative is that the means are not all equal.

\(H_0: \mu _1= \mu _2=…= \mu _K \)
\(H_1: The~ \mu_s~Are~Not~All~Equal\)

This means that when we are dealing with many groups, we cannot compare them pairwise. We can simply answer if the means between groups can be considered as equal or not.


Tukey’s HSD

What about if we want to compare all the groups pairwise? In this case, we can apply the Tukey’s HSD which is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.


Example of ANOVA vs Tukey’s HSD

Let’s assume that we are dealing with the following 4 groups:

  • Group “a”: 100 observations from the Normal Distribution with mean 10 and standard deviation 5
  • Group “b”: 100 observations from the Normal Distribution with mean 10 and standard deviation 5
  • Group “c”: 100 observations from the Normal Distribution with mean 11 and standard deviation 6
  • Group “d”: 100 observations from the Normal Distribution with mean 11 and standard deviation 6

Clearly, we were expecting the ANOVA to reject to Null Hypothesis but we would also to know that the Group a and Group b are not statistically different and the same with the Group c and Group d

Let’s work in R:

library(multcomp)
library(tidyverse)

# Create the four groups
set.seed(10) 
df1 <- data.frame(Var="a", Value=rnorm(100,10,5))
df2 <- data.frame(Var="b", Value=rnorm(100,10,5))
df3 <- data.frame(Var="c", Value=rnorm(100,11,6))
df4 <- data.frame(Var="d", Value=rnorm(100,11,6))

# merge them in one data frame
df<-rbind(df1,df2,df3,df4)

# convert Var to a factor
df$Var<-as.factor(df$Var)

df%>%ggplot(aes(x=Value, fill=Var))+geom_density(alpha=0.5)
 
ANOVA vs Multiple Comparisons 1

ANOVA

# ANOVA
model1<-lm(Value~Var, data=df)
anova(model1)
 

Output:

Analysis of Variance Table

Response: Value
           Df  Sum Sq Mean Sq F value    Pr(>F)    
Var         3   565.7 188.565   6.351 0.0003257 ***
Residuals 396 11757.5  29.691                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Clearly, we reject the null hypothesis since the p-value is 0.0003257

Tukey’s HSD

Let’s apply the Tukey HSD test to test all the means.

# Tukey multiple comparisons
summary(glht(model1, mcp(Var="Tukey")))
 

Output:

	 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = Value ~ Var, data = df)

Linear Hypotheses:
           Estimate Std. Error t value Pr(>|t|)   
b - a == 0   0.2079     0.7706   0.270  0.99312   
c - a == 0   1.8553     0.7706   2.408  0.07727 . 
d - a == 0   2.8758     0.7706   3.732  0.00129 **
c - b == 0   1.6473     0.7706   2.138  0.14298   
d - b == 0   2.6678     0.7706   3.462  0.00329 **
d - c == 0   1.0205     0.7706   1.324  0.54795   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
 

As we can see from the output above, the difference between c vs a and c vs b found not be statistically significant although they are from different distributions. The reason for that is the “issue” with the multiple comparisons. Let’s compare them by applying the t-test

t-test a vs c

t.test(df%>%filter(Var=="a")%>%pull(), df%>%filter(Var=="c")%>%pull())

Output:

	Welch Two Sample t-test

data:  df %>% filter(Var == "a") %>% pull() and df %>% filter(Var == "c") %>% pull()
t = -2.4743, df = 189.47, p-value = 0.01423
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.3343125 -0.3761991
sample estimates:
mean of x mean of y 
 9.317255 11.172511 
 

t-test b vs c

t.test(df%>%filter(Var=="b")%>%pull(), df%>%filter(Var=="c")%>%pull())
 

Output:

	Welch Two Sample t-test

data:  df %>% filter(Var == "b") %>% pull() and df %>% filter(Var == "c") %>% pull()
t = -2.1711, df = 191.53, p-value = 0.03115
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.1439117 -0.1507362
sample estimates:
mean of x mean of y 
 9.525187 11.172511 
 

As we can see from above, the means of the two groups, in both cases, found to be statistically significant, if we ignore the multiple comparisons.

Discussion

When we are dealing with multiple comparisons and we want to apply pairwise comparisons, then Tukey’s HSD is a good option. Another approach is to consider the P-Value Adjustments.

You can also have a look at how you can consider the multiple comparisons in A/B/n Testing

To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)