[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Observations are grouped

When we have recorded two traits in different subjects, we can be interested in describing their joint variability, by using the Pearson’s correlation coefficient. That’s ok, altough we have to respect some basic assumptions (e.g. linearity) that have been detailed elsewhere (see here). Problems may arise when we need to test the hypothesis that the correlation coefficient is equal to 0. In this case, we need to make sure that all the couples of observations are taken on independent subjects.

Unfortunately, this is most often false whenever measures are taken from designed field experiments. In this case, observations may be grouped by one or more treatment/blocking factors. This has been clearly described by Bland and Altman (1994); we would like to give an example that is more closely related to plant/crop science. Think about a genotype experiment, where we compare the behaviour of several genotypes in a randomised blocks design. Usually, we do not only measure yield. We also measure other traits, such as crop height. At the end of the experiment, we might be interested in reporting the correlation between yield and height. How should we proceed? It would seem an easy task, but it is not.

Let’s assume that we have a randomised blocks design, with 27 genotypes and 3 replicates. For each plot, we recorded two traits, i.e. yield and the weight of thousand kernels (TKW). In the end, we have 81 plots and just as many couples of measures in all. We will use the dataset ‘WheatQuality.csv’, that is available on ‘gitHub’.

```library(Hmisc)
library(knitr)
library(plyr)

```##     Genotype Block Height  TKW Yield Whectol
## 1 arcobaleno     1     90 44.5 64.40    83.2
## 2 arcobaleno     2     90 42.8 60.58    82.2
## 3 arcobaleno     3     88 42.7 59.42    83.1
## 4       baio     1     80 40.6 51.93    81.8
## 5       baio     2     75 42.7 51.34    81.3
## 6       baio     3     76 41.1 47.78    81.1```

# How many correlations?

It may be tempting to consider the whole lot of measures and calculate the correlation coefficient between yield and TKW. This is the result:

```ra <- with(dataset, rcorr(Yield, TKW) )
ra\$r[1,2] #Correlation coefficient```

`##  0.540957`

`ra\$P[1,2] #P-level`

`##  1.850931e-07`

We observe a positive correlation, and \(r\) seems to be significantly different from 0. Therefore, we would be encouraged to conclude that plots with a high value on yield tend to have a high value on TKW, as well. Unfortunately, such a conclusion is not supported by the data.

Indeed, the test of significance is clearly invalid, as the 81 plots are not independent; they are grouped by block and genotype and we are totally neglecting these two effects. Are we sure that the same correlation holds for all genotypes/blocks? Let’s check this.

```wCor <- function(x) cor(x\$Yield, x\$TKW)
wgCors <- ddply(dataset, ~Genotype, wCor)
wgCors```

```##       Genotype         V1
## 1   arcobaleno  0.9847228
## 2         baio  0.1611952
## 3      claudio -0.9993872
## 5     colosseo  0.4564855
## 6        creso -0.5910193
## 7       duilio -0.9882330
## 8       gianni -0.7603802
## 9       giotto  0.9520211
## 10      grazia  0.4980828
## 11       iride  0.7563338
## 12    meridano  0.1174342
## 13      neodur  0.4805871
## 14      orobel  0.8907754
## 15 pietrafitta -0.9633891
## 16  portobello  0.9210135
## 17   portofino -0.9900764
## 18   portorico  0.1394211
## 19       preco  0.9007067
## 21    sancarlo -0.6460670
## 22      simeto -0.4051779
## 23       solex -0.6066363
## 24 terrabianca -0.4076416
## 25       verdi  0.5801404
## 26     vesuvio -0.7797493
## 27    vitromax -0.8056514```

```wbCors <- ddply(dataset, ~Block, wCor)
wbCors```

```##   Block        V1
## 1     1 0.5998137
## 2     2 0.5399990
## 3     3 0.5370398```

As for the genotypes, we have 27 correlation coefficients, ranging from -0.999 to 0.985. We only have three couples of measurements per genotype and it is clear that this is not much, to reliably estimate a correlation coefficient. However, it is enough to be suspicious about the extent of correlation between yield and TKW, as it may depend on the genotype.

On the other hand, the correlation within blocks is more constant, independent on the block and similar to the correlation between plots.

It may be interesting to get an estimate of the average within-group correlation. To this aim, we can perform two separate ANOVAs (one per trait), including all relevant effects (blocks and genotypes) and calculate the correlation between the residuals:

```mod1 <- lm(Yield ~ factor(Block) + Genotype, data = dataset)
mod2 <- lm(TKW ~ factor(Block) + Genotype, data = dataset)
wCor <- rcorr(residuals(mod1), residuals(mod2))
wCor\$r```

```##            x          y
## x 1.00000000 0.03133109
## y 0.03133109 1.00000000```

`wCor\$P`

```##           x         y
## x        NA 0.7812693
## y 0.7812693        NA```

The average within-group correlation is very small and unsignificant. Let’s think about this correlation: residuals represent yield and TKW values for all plots, once the effects of blocks and genotypes have been removed. A high correlation of residuals would mean that, letting aside the effects of the block and genotype to which it belongs, a plot with a high value on yield also shows a high value on TKW. The existence of such correlation is clearly unsopported by our dataset.

As the next step, we could consider the means for genotypes/blocks and see whether they are correlated. Blocks and genotypes are independent and, in principle, significance testing is permitted. However, this is not recommended with block means, as three data are too few to make tests.

```means <- ddply(dataset, ~Genotype, summarise, Mu1=mean(Yield), Mu2 = mean(TKW))
rgPear <- rcorr( as.matrix(means[,2:3]) )
rgPear\$r```

```##           Mu1       Mu2
## Mu1 1.0000000 0.5855966
## Mu2 0.5855966 1.0000000```

`rgPear\$P`

```##            Mu1        Mu2
## Mu1         NA 0.00133149
## Mu2 0.00133149         NA```

```means <- ddply(dataset, ~Block, summarise, Mu1=mean(Yield), Mu2 = mean(TKW))
rbPear <- cor( as.matrix(means[,2:3]) )
rbPear```

```##             Mu1         Mu2
## Mu1  1.00000000 -0.08812544
## Mu2 -0.08812544  1.00000000```

We note that the correlation between genotype means is high and significant. On the contrary, the correlation between block means is near to 0.

# And so what?

At this stage, you may be confused… Let’s try to clear the fog.

Obtaining a reliable measure of correlation from designed experiments is not obvious. Indeed, in every designed field experiment we have groups of subjects and there are several possible types of correlation:

1. correlation between plot measurements
2. correlation between the residuals
3. correlation between treatment/block means

All these correlations should be investigated and used for interpretation.

1. The ‘naive’ correlation between the plot measurements is very easily calculated, but it is grossy misleading. Indeed, it disregards the treatment/block effects and it does not permit hypotheses testing, as the subjects are not independent. In this example, looking at the ‘naive’ correlation coefficient, we would wrongly conclude that plots with high yield also have high TKW, while further analyses show that this is not true, in general. We would reasonably suggest that the ‘naive’ correlation coefficient is never used for interpretation.
2. The correlation between the residuals is a reliable measure of joint variation, because the experimental design is adequately referenced, by removing the effects of tratments/blocks. In this example, residuals are not correlated. Further analyses show that the correlation between yield and TKW, if any, may depend on the genotype, while it does not depend on the block.
3. The correlation between treatment/block means permits to assess whether the treatment/block effects on the two traits are correlated. In this case, while we are not allowed to conclude that yield and TKW are, in general, correlated, we can conclude that the genotypes with a high level of yield also show a high level of TKW.

# Take-home message

Whenever we have data from designed field experiments, our correlation analyses should never be limited to the calculation of the ‘naive’ correlation coefficient between the observed values. This may not be meaningful! On the contrary, our interpretation should be mainly focused on the correlation between residuals and on the correlation between the effects of treatments/blocks.

An elegant and advanced method to perform sound correlation analyses on data from designed field experiments has been put forward by Piepho (2018), within the frame of mixed models. Such an approach will be described in another post.