# Analysis of variance: ANOVA, for multiple comparisons

July 30, 2009
By

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

Analysis of variance: ANOVA, for multiple comparisons

The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).
Proceed with the following example:

The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:

Store A: 65, 48, 66, 75, 70, 55
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56

To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.

We begin with the Bartlett test.

First we create the 4 vectors:

`a = c(65, 48, 66, 75, 70, 55)b = c(64, 44, 70, 70, 68, 59)c = c(60, 50, 65, 69, 69, 57)d = c(62, 46, 68, 72, 67, 56)`

Now we combine the 4 vectors in a single vector:

`dati = c(a, b, c, d)`

Now, on this vector in which are stored all the data, we create the 4 levels:

`groups = factor(rep(letters[1:4], each = 6))`

We can observe the contents of the vector `groups` simply by typing `groups + [enter]`.

At this point we start the Bartlett test:

`bartlett.test(dati, groups)        Bartlett test of homogeneity of variancesdata:  dati and groups Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228`

The function gave us the value of the statistical tests (K squared), and the p-value. Can be argued that the variances are homogeneous since p-value > 0.05. Alternatively, we can compare the Bartlett’s K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the `qchisq` function:

`qchisq(0.950, 3)[1] 7.814728`

Chi-squared > Bartlett’s K-squared: we accept the null hypothesis H0 (variances homogeneity)

We try now to check the homoskedasticity, with the Fligner-Killeen test.
The syntax is quite similar, and then proceed quickly.

`a = c(65, 48, 66, 75, 70, 55)b = c(64, 44, 70, 70, 68, 59)c = c(60, 50, 65, 69, 69, 57)d = c(62, 46, 68, 72, 67, 56)dati = c(a, b, c, d)groups = factor(rep(letters[1:4], each = 6))fligner.test(dati, groups)        Fligner-Killeen test of homogeneity of variancesdata:  dati and groups Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878`

The conclusions are similar to those for the test of Bartlett.

Having verified the homoskedasticity of the 4 groups, we can proceed with the ANOVA model.

First organize the values, fitting the model:

`fit = lm(formula = dati ~ groups)`

Then we analyze the ANOVA model:

`anova (fit)Analysis of Variance TableResponse: dati          Df  Sum Sq Mean Sq F value Pr(>F)groups     3    8.46    2.82  0.0327 0.9918Residuals 20 1726.50   86.33  `

The output of the function is a classical ANOVA table with the following data:
`Df` = degree of freedom
`Sum Sq` = deviance (within groups, and residual)
`Mean Sq` = variance (within groups, and residual)
`F value` = the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)
`Pr(>F)` = p-value

Since p-value > 0.05, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:

`qf(0.950, 20, 3)[1] 8.66019`

Tabulated F-value > computed F-value: we accept the null hyptohesis.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...