# Analysis of variance: ANOVA, for multiple comparisons

**Statistic on aiR**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Analysis of variance: ANOVA, for multiple comparisons

The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).

Proceed with the following example:

The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:

Store B: 64, 44, 70, 70, 68, 59

Store C: 60, 50, 65, 69, 69, 57

Store D: 62, 46, 68, 72, 67, 56

To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.

We begin with the **Bartlett test**.

First we create the 4 vectors:

a = c(65, 48, 66, 75, 70, 55) b = c(64, 44, 70, 70, 68, 59) c = c(60, 50, 65, 69, 69, 57) d = c(62, 46, 68, 72, 67, 56)

Now we combine the 4 vectors in a single vector:

dati = c(a, b, c, d)

Now, on this vector in which are stored all the data, we create the 4 levels:

groups = factor(rep(letters[1:4], each = 6))

We can observe the contents of the vector `groups`

simply by typing `groups + [enter]`

.

At this point we start the Bartlett test:

bartlett.test(dati, groups) Bartlett test of homogeneity of variances data: dati and groups Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228

The function gave us the value of the statistical tests (**K squared**), and the p-value. Can be argued that the variances are homogeneous since *p-value > 0.05*. Alternatively, we can compare the Bartlett’s K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the `qchisq`

function:

qchisq(0.950, 3) [1] 7.814728

Chi-squared > Bartlett’s K-squared: we accept the null hypothesis H0 (variances homogeneity)

We try now to check the homoskedasticity, with the **Fligner-Killeen test**.

The syntax is quite similar, and then proceed quickly.

a = c(65, 48, 66, 75, 70, 55) b = c(64, 44, 70, 70, 68, 59) c = c(60, 50, 65, 69, 69, 57) d = c(62, 46, 68, 72, 67, 56) dati = c(a, b, c, d) groups = factor(rep(letters[1:4], each = 6)) fligner.test(dati, groups) Fligner-Killeen test of homogeneity of variances data: dati and groups Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878

The conclusions are similar to those for the test of Bartlett.

Having verified the homoskedasticity of the 4 groups, we can proceed with the **ANOVA model**.

First organize the values, fitting the model:

fit = lm(formula = dati ~ groups)

Then we analyze the ANOVA model:

anova (fit) Analysis of Variance Table Response: dati Df Sum Sq Mean Sq F value Pr(>F) groups 3 8.46 2.82 0.0327 0.9918 Residuals 20 1726.50 86.33

The output of the function is a classical ANOVA table with the following data:`Df`

= degree of freedom`Sum Sq`

= deviance (within groups, and residual)`Mean Sq`

= variance (within groups, and residual)`F value`

= the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)`Pr(>F)`

= p-value

Since *p-value > 0.05*, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:

qf(0.950, 20, 3) [1] 8.66019

Tabulated F-value > computed F-value: we accept the null hyptohesis.

**leave a comment**for the author, please follow the link and comment on their blog:

**Statistic on aiR**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.