Analysis of variance: ANOVA, for multiple comparisons
The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).
Proceed with the following example:
The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56
To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.
We begin with the Bartlett test.
First we create the 4 vectors:
a = c(65, 48, 66, 75, 70, 55) b = c(64, 44, 70, 70, 68, 59) c = c(60, 50, 65, 69, 69, 57) d = c(62, 46, 68, 72, 67, 56)
Now we combine the 4 vectors in a single vector:
dati = c(a, b, c, d)
Now, on this vector in which are stored all the data, we create the 4 levels:
groups = factor(rep(letters[1:4], each = 6))
We can observe the contents of the vector
groups simply by typing
groups + [enter].
At this point we start the Bartlett test:
bartlett.test(dati, groups) Bartlett test of homogeneity of variances data: dati and groups Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228
The function gave us the value of the statistical tests (K squared), and the p-value. Can be argued that the variances are homogeneous since p-value > 0.05. Alternatively, we can compare the Bartlett’s K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the
qchisq(0.950, 3)  7.814728
Chi-squared > Bartlett’s K-squared: we accept the null hypothesis H0 (variances homogeneity)
We try now to check the homoskedasticity, with the Fligner-Killeen test.
The syntax is quite similar, and then proceed quickly.
a = c(65, 48, 66, 75, 70, 55) b = c(64, 44, 70, 70, 68, 59) c = c(60, 50, 65, 69, 69, 57) d = c(62, 46, 68, 72, 67, 56) dati = c(a, b, c, d) groups = factor(rep(letters[1:4], each = 6)) fligner.test(dati, groups) Fligner-Killeen test of homogeneity of variances data: dati and groups Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878
The conclusions are similar to those for the test of Bartlett.
Having verified the homoskedasticity of the 4 groups, we can proceed with the ANOVA model.
First organize the values, fitting the model:
fit = lm(formula = dati ~ groups)
Then we analyze the ANOVA model:
anova (fit) Analysis of Variance Table Response: dati Df Sum Sq Mean Sq F value Pr(>F) groups 3 8.46 2.82 0.0327 0.9918 Residuals 20 1726.50 86.33
The output of the function is a classical ANOVA table with the following data:
Df = degree of freedom
Sum Sq = deviance (within groups, and residual)
Mean Sq = variance (within groups, and residual)
F value = the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)
Pr(>F) = p-value
Since p-value > 0.05, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:
qf(0.950, 20, 3)  8.66019
Tabulated F-value > computed F-value: we accept the null hyptohesis.