**Statistic on aiR**, and kindly contributed to R-bloggers)

The **one-way analysis of variance** is a useful technique to verify if the means of more groups are equals. But this analysis may not be very useful for more complex problems. For example, it may be necessary to take into account two factors of variability to determine if the averages between the groups depend on the group classification ( “zone”) or the second variable that is to consider (“block”). In this case should be used the **two-way analysis of variance** (**two-way ANOVA**).

We begin immediately with an example so as to facilitate the understanding of this statistical method. The data collected are organized into ** double entry tables**.

The director of a company has collected revenue (thousand dollars) for 5 years and under per month. You want to see if the revenue depends on the year and/or month, or if they are independent of these two factors.

Conceptually, the problem may be solved by an *horizontal ANOVA* and a *vertical ANOVA*, in order to verify if the average revenues per year are the same, and if they are equal to the average revenue computed by month. This would require many calculations, and so we prefer to use the two-way ANOVA, which provides the result instantly.

This is the table of revenue classified by year and month:

$$\begin{tabular}{|c||ccccc||r|}\hline Months & Year 1 & Year 2 & Year 3 & Year 4 & Year 5\\\hline January&15&18&22&23&24\\ February&22&25&15&15&14\\ March&18&22&15&19&21\\ April&23&15&14&17&18\\ May&23&15&26&18&14\\ June&12&15&11&10&8\\ July&26&12&23&15&18\\ August&19&17&15&20&10\\ September&15&14&18&19&20\\ October&14&18&10&12&23\\ November&14&22&19&17&11\\ December&21&23&11&18&14\\ \hline \end{tabular}$$

As with the one-way ANOVA, even here the aim is to structure a **Fisher’s F-test** to assess the significance of the variable “month” and of the variable “year”, determine if the revenues are dependent on one (or both) the criteria for classification.*How to perform the two-way ANOVA in R*? First creates an array containing all the values tabulated, transcribed by rows:

revenue = c(15,18,22,23,24, 22,25,15,15,14, 18,22,15,19,21,

23,15,14,17,18, 23,15,26,18,14, 12,15,11,10,8, 26,12,23,15,18,

19,17,15,20,10, 15,14,18,19,20, 14,18,10,12,23, 14,22,19,17,11,

21,23,11,18,14)

According to the months, you create a factor of 12 levels (the number of rows) with 5 repetitions (the number columns) in this manner:

months = gl(12,5)

According to the years you create a factor with 5 levels (the number of column) and 1 recurrence, declaring the total number of observations (the length of the vector revenue):

years = gl(5, 1, length(entrate))

Now you can fit the linear model and produce the ANOVA table:

fit = aov(revenue ~ months + years)

anova(fit)

Analysis of Variance Table

Response: revenue

Df Sum Sq Mean Sq F value Pr(>F)

months 11 308.45 28.04 1.4998 0.1660

years 4 44.17 11.04 0.5906 0.6712

Residuals 44 822.63 18.70

Now interpret the results.

The significance of the difference between months is: *F = 1.4998*. This value is lower than the value tabulated and indeed *p-value > 0.05*. So we accept the null hypothesis: the means of revenue evaluated according to the months are equal, then the variable “months” has no effect on revenue.

The significance of the difference between years is: *F = 0.5906*. This value is lower than the value tabulated and indeed *p-value > 0.05*. So we accept the null hypothesis: the means of revenue evaluated according to the years are equal, then the variable “years” has no effect on revenue.

**leave a comment**for the author, please follow the link and comment on their blog:

**Statistic on aiR**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...