January 6, 2010
By

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

The Latin square design is used where the researcher desires to control the variation in an experiment that is related to rows and columns in the field.
Remember that:
* Treatments are assigned at random within rows and columns, with each treatment once per row and once per column.
* There are equal numbers of rows, columns, and treatments.
* Useful where the experimenter desires to control variation in two different directions

The formula used for this kind of three-way ANOVA are:

 Source ofvariation Degrees offreedoma Sums ofsquares (SSQ) Meansquare (MS) F Rows (R) r-1 SSQR SSQR/(r-1) MSR/MSE Columns (C) r-1 SSQC SSQC/(r-1) MSC/MSE Treatments (Tr) r-1 SSQTr SSQTr/(r-1) MSTr/MSE Error (E) (r-1)(r-2) SSQE SSQE/((r-1)(r-2)) Total (Tot) r2-1 SSQTot awhere r = number of (treatments=rows=columns).

Suppose you want to analyse the productivity of 5 kind on fertilizer, 5 kind of tillage, and 5 kind of seed. The data are organized in a latin square design, as follow:

`             treatA  treatB  treatC  treatD  treatEfertilizer1  "A42"   "C47"   "B55"   "D51"   "E44"         fertilizer2  "E45"   "B54"   "C52"   "A44"   "D50"         fertilizer3  "C41"   "A46"   "D57"   "E47"   "B48"         fertilizer4  "B56"   "D52"   "E49"   "C50"   "A43"         fertilizer5  "D47"   "E49"   "A45"   "B54"   "C46"  `

The three factors are: fertilizer (fertilizer1:5), tillage (treatA:E), seed (A:E). The numbers are the productivity in cwt / year.

Now create a dataframe in R with these data:

`fertil <- c(rep("fertil1",1), rep("fertil2",1), rep("fertil3",1), rep("fertil4",1), rep("fertil5",1))treat <- c(rep("treatA",5), rep("treatB",5), rep("treatC",5), rep("treatD",5), rep("treatE",5))seed <- c("A","E","C","B","D", "C","B","A","D","E", "B","C","D","E","A", "D","A","E","C","B", "E","D","B","A","C")freq <- c(42,45,41,56,47, 47,54,46,52,49, 55,52,57,49,45, 51,44,47,50,54, 44,50,48,43,46) mydata <- data.frame(treat, fertil, seed, freq)mydata    treat  fertil seed freq1  treatA fertil1    A   422  treatA fertil2    E   453  treatA fertil3    C   414  treatA fertil4    B   565  treatA fertil5    D   476  treatB fertil1    C   477  treatB fertil2    B   548  treatB fertil3    A   469  treatB fertil4    D   5210 treatB fertil5    E   4911 treatC fertil1    B   5512 treatC fertil2    C   5213 treatC fertil3    D   5714 treatC fertil4    E   4915 treatC fertil5    A   4516 treatD fertil1    D   5117 treatD fertil2    A   4418 treatD fertil3    E   4719 treatD fertil4    C   5020 treatD fertil5    B   5421 treatE fertil1    E   4422 treatE fertil2    D   5023 treatE fertil3    B   4824 treatE fertil4    A   4325 treatE fertil5    C   46`

We can re-create the original table, using the matrix function:

`matrix(mydata\$seed, 5,5)     [,1] [,2] [,3] [,4] [,5][1,] "A"  "C"  "B"  "D"  "E" [2,] "E"  "B"  "C"  "A"  "D" [3,] "C"  "A"  "D"  "E"  "B" [4,] "B"  "D"  "E"  "C"  "A" [5,] "D"  "E"  "A"  "B"  "C" matrix(mydata\$freq, 5,5)     [,1] [,2] [,3] [,4] [,5][1,]   42   47   55   51   44[2,]   45   54   52   44   50[3,]   41   46   57   47   48[4,]   56   52   49   50   43[5,]   47   49   45   54   46`

Before proceeding with the analysis of variance of this Latin square design, you should perform a Boxplot, aimed to have an idea of what we expect:

`par(mfrow=c(2,2))plot(freq ~ fertil+treat+seed, mydata)`

Note that the differences considering the fertilizer is low; it is medium considering the tillage, and is very high considering the seed.
Now confirm these graphics observations, with the ANOVA table:

`myfit <- lm(freq ~ fertil+treat+seed, mydata)anova(myfit)Analysis of Variance TableResponse: freq          Df  Sum Sq Mean Sq F value   Pr(>F)    fertil     4  17.760   4.440  0.7967 0.549839    treat      4 109.360  27.340  4.9055 0.014105 *  seed       4 286.160  71.540 12.8361 0.000271 ***Residuals 12  66.880   5.573                     ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 `

Well, the boxplot was useful. Look at the significance of the F-test.
– The difference between group considering the fertilizer is not significant (p-value > 0.1);
– The difference between group considering the tillage is quite significant (p-value < 0.05);
– The difference between group considering the seed is very significant (p-value < 0.001);

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...