Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Analysis of variance: ANOVA, for multiple comparisons

The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).
Proceed with the following example:

The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:
Store A: 65, 48, 66, 75, 70, 55
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56

To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.

We begin with the Bartlett test.

First we create the 4 vectors:

a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)

Now we combine the 4 vectors in a single vector:

dati = c(a, b, c, d)

Now, on this vector in which are stored all the data, we create the 4 levels:

groups = factor(rep(letters[1:4], each = 6))

We can observe the contents of the vector groups simply by typing groups + [enter].

At this point we start the Bartlett test:

bartlett.test(dati, groups)

Bartlett test of homogeneity of variances

data:  dati and groups
Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228

The function gave us the value of the statistical tests (K squared), and the p-value. Can be argued that the variances are homogeneous since p-value > 0.05. Alternatively, we can compare the Bartlett’s K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the qchisq function:

qchisq(0.950, 3)
 7.814728

Chi-squared > Bartlett’s K-squared: we accept the null hypothesis H0 (variances homogeneity)

We try now to check the homoskedasticity, with the Fligner-Killeen test.
The syntax is quite similar, and then proceed quickly.

a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)

dati = c(a, b, c, d)

groups = factor(rep(letters[1:4], each = 6))

fligner.test(dati, groups)

Fligner-Killeen test of homogeneity of variances

data:  dati and groups
Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878

The conclusions are similar to those for the test of Bartlett.

Having verified the homoskedasticity of the 4 groups, we can proceed with the ANOVA model.

First organize the values, fitting the model:

fit = lm(formula = dati ~ groups)

Then we analyze the ANOVA model:

anova (fit)

Analysis of Variance Table

Response: dati
Df  Sum Sq Mean Sq F value Pr(>F)
groups     3    8.46    2.82  0.0327 0.9918
Residuals 20 1726.50   86.33  

The output of the function is a classical ANOVA table with the following data:
Df = degree of freedom
Sum Sq = deviance (within groups, and residual)
Mean Sq = variance (within groups, and residual)
F value = the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)
Pr(>F) = p-value

Since p-value > 0.05, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:

qf(0.950, 20, 3)
 8.66019

Tabulated F-value > computed F-value: we accept the null hyptohesis.