**Statistic on aiR**, and kindly contributed to R-bloggers)

Analysis of variance: ANOVA, for multiple comparisons

The ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).

Proceed with the following example:

The manager of a supermarket chain wants to see if the consumption in kilowatts of 4 stores between them are equal. He collects data at the end of each month for 6 months. The results are:

Store B: 64, 44, 70, 70, 68, 59

Store C: 60, 50, 65, 69, 69, 57

Store D: 62, 46, 68, 72, 67, 56

To proceed with the verification ANOVA, we must first verify the homoskedasticity (ie test for homogeneity of variances). The software R provides two tests: the Bartlett test, and the Fligner-Killeen test.

We begin with the **Bartlett test**.

First we create the 4 vectors:

a = c(65, 48, 66, 75, 70, 55)

b = c(64, 44, 70, 70, 68, 59)

c = c(60, 50, 65, 69, 69, 57)

d = c(62, 46, 68, 72, 67, 56)

Now we combine the 4 vectors in a single vector:

dati = c(a, b, c, d)

Now, on this vector in which are stored all the data, we create the 4 levels:

groups = factor(rep(letters[1:4], each = 6))

We can observe the contents of the vector `groups`

simply by typing `groups + [enter]`

.

At this point we start the Bartlett test:

bartlett.test(dati, groups)

Bartlett test of homogeneity of variances

data: dati and groups

Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228

The function gave us the value of the statistical tests (**K squared**), and the p-value. Can be argued that the variances are homogeneous since *p-value > 0.05*. Alternatively, we can compare the Bartlett’s K-squared with the value of chi-square-tables; we compute that value, assigning the value of alpha and degrees of freedom at the `qchisq`

function:

qchisq(0.950, 3)

[1] 7.814728

Chi-squared > Bartlett’s K-squared: we accept the null hypothesis H0 (variances homogeneity)

We try now to check the homoskedasticity, with the **Fligner-Killeen test**.

The syntax is quite similar, and then proceed quickly.

a = c(65, 48, 66, 75, 70, 55)

b = c(64, 44, 70, 70, 68, 59)

c = c(60, 50, 65, 69, 69, 57)

d = c(62, 46, 68, 72, 67, 56)

dati = c(a, b, c, d)

groups = factor(rep(letters[1:4], each = 6))

fligner.test(dati, groups)

Fligner-Killeen test of homogeneity of variances

data: dati and groups

Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878

The conclusions are similar to those for the test of Bartlett.

Having verified the homoskedasticity of the 4 groups, we can proceed with the **ANOVA model**.

First organize the values, fitting the model:

fit = lm(formula = dati ~ groups)

Then we analyze the ANOVA model:

anova (fit)

Analysis of Variance Table

Response: dati

Df Sum Sq Mean Sq F value Pr(>F)

groups 3 8.46 2.82 0.0327 0.9918

Residuals 20 1726.50 86.33

The output of the function is a classical ANOVA table with the following data:`Df`

= degree of freedom`Sum Sq`

= deviance (within groups, and residual)`Mean Sq`

= variance (within groups, and residual)`F value`

= the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)`Pr(>F)`

= p-value

Since *p-value > 0.05*, we accept the null hypothesis H0: the four means are statistically equal. We can also compare the computed F-value with the tabulated F-value:

qf(0.950, 20, 3)

[1] 8.66019

Tabulated F-value > computed F-value: we accept the null hyptohesis.

**leave a comment**for the author, please follow the link and comment on their blog:

**Statistic on aiR**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...