Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The one-way analysis of variance is a useful technique to verify if the means of more groups are equals. But this analysis may not be very useful for more complex problems. For example, it may be necessary to take into account two factors of variability to determine if the averages between the groups depend on the group classification ( “zone”) or the second variable that is to consider (“block”). In this case should be used the two-way analysis of variance (two-way ANOVA). We begin immediately with an example so as to facilitate the understanding of this statistical method. The data collected are organized into double entry tables. The director of a company has collected revenue (thousand dollars) for 5 years and under per month. You want to see if the revenue depends on the year and/or month, or if they are independent of these two factors. Conceptually, the problem may be solved by an horizontal ANOVA and a vertical ANOVA, in order to verify if the average revenues per year are the same, and if they are equal to the average revenue computed by month. This would require many calculations, and so we prefer to use the two-way ANOVA, which provides the result instantly. This is the table of revenue classified by year and month: $$\begin{tabular}{|c||ccccc||r|}\hline Months & Year 1 & Year 2 & Year 3 & Year 4 & Year 5\\\hline January&15&18&22&23&24\\ February&22&25&15&15&14\\ March&18&22&15&19&21\\ April&23&15&14&17&18\\ May&23&15&26&18&14\\ June&12&15&11&10&8\\ July&26&12&23&15&18\\ August&19&17&15&20&10\\ September&15&14&18&19&20\\ October&14&18&10&12&23\\ November&14&22&19&17&11\\ December&21&23&11&18&14\\ \hline \end{tabular}$$ As with the one-way ANOVA, even here the aim is to structure a Fisher’s F-test to assess the significance of the variable “month” and of the variable “year”, determine if the revenues are dependent on one (or both) the criteria for classification. How to perform the two-way ANOVA in R? First creates an array containing all the values tabulated, transcribed by rows:
revenue = c(15,18,22,23,24, 22,25,15,15,14, 18,22,15,19,21,
23,15,14,17,18, 23,15,26,18,14, 12,15,11,10,8, 26,12,23,15,18,
19,17,15,20,10, 15,14,18,19,20, 14,18,10,12,23, 14,22,19,17,11,
21,23,11,18,14)
According to the months, you create a factor of 12 levels (the number of rows) with 5 repetitions (the number columns) in this manner:
months = gl(12,5)
According to the years you create a factor with 5 levels (the number of column) and 1 recurrence, declaring the total number of observations (the length of the vector revenue):
years = gl(5, 1, length(entrate))
Now you can fit the linear model and produce the ANOVA table:
fit = aov(revenue ~ months + years)

anova(fit)

Analysis of Variance Table

Response: revenue
Df Sum Sq Mean Sq F value Pr(>F)
months    11 308.45   28.04  1.4998 0.1660
years      4  44.17   11.04  0.5906 0.6712
Residuals 44 822.63   18.70
Now interpret the results. The significance of the difference between months is: F = 1.4998. This value is lower than the value tabulated and indeed p-value > 0.05. So we cannot reject the null hypothesis: the means of revenue evaluated according to the months are not proven to be not equal, hence we remain in our belief that the variable “months” has no effect on revenue. The significance of the difference between years is: F = 0.5906. This value is lower than the value tabulated and indeed p-value > 0.05. So we fail to reject the null hypothesis: the means of revenue evaluated according to the years are not found to be un-equal, then the variable “years” has no effect on revenue.