# A simple ANOVA

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I was browsing Davies **Wiekvoet**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*Design and Analysis of Industrial Experiments*(second edition, 1967). Published by for ICI in times when industry did that kind of thing. It is quite an applied book. On page 107 there is an example where the variance of a process is estimated.

### Data

Data is from nine batches from which three samples were selected (A, B and C) and each a duplicate measurement. I am not sure about copyright of these data, so I will not reprint the data here. The problem is to determine the measurement ans sampling error in a chemical process.ggplot(r4,aes(x=Sample,y=x))+

geom_point()+

facet_wrap(~ batch )

### Analysis

At the time of writing the book, the only approach was to do a classical ANOVA and calculate the estimates from there.aov(x~ batch + batch:Sample,data=r4) %>%

anova

Analysis of Variance Table

Response: x

Df Sum Sq Mean Sq F value Pr(>F)

batch 8 792.88 99.110 132.6710 < 2e-16 ***

batch:Sample 18 25.30 1.406 1.8818 0.06675 .

Residuals 27 20.17 0.747

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In this case the residual variation is 0.75. The batch:Sample variation estimates is, due to the design, twice the sapling variation plus residual variation. Hence it is estimated as 0.33. How lucky we are to have tools (lme4) which can do this estimate directly. In this case, as it was a well designed experiment, these estimates are the same as from the ANOVA.

l1 <- lmer(x ~1+ (1 | batch) + (1|batch:Sample) ,data=r4 )

summary(l1)

Linear mixed model fit by REML [‘lmerMod’]

Formula: x ~ 1 + (1 | batch) + (1 | batch:Sample)

Data: r4

REML criterion at convergence: 189.4

Scaled residuals:

Min 1Q Median 3Q Max

-1.64833 -0.50283 -0.06649 0.55039 1.57801

Random effects:

Groups Name Variance Std.Dev.

batch:Sample (Intercept) 0.3294 0.5739

batch (Intercept) 16.2841 4.0354

Residual 0.7470 0.8643

Number of obs: 54, groups: batch:Sample, 27; batch, 9

Fixed effects:

Estimate Std. Error t value

(Intercept) 47.148 1.355 34.8

A next step is confidence intervals around the estimates. Davies uses limits from a Chi-squared distribution for the residual variation, leading to a 90% interval 0.505 to 1.25. In contrast lme4 has two estimators, profile (

*computing a likelihood profile and finding the appropriate cutoffs based on the likelihood ratio test;*) and bootstrap (

*perform parametric bootstrapping with confidence intervals computed from the bootstrap distribution according to boot.type*). Each of these takes one or few second on my laptop, not feasible for the pre computer age. The estimates are different, to my surprise more narrow:

Computing profile confidence intervals …

5 % 95 %

.sig01 0.0000000 0.9623748

.sig02 2.6742109 5.9597328

.sigma 0.7017849 1.1007261

(Intercept) 44.8789739 49.4173227

Computing bootstrap confidence intervals …

5 % 95 %

sd_(Intercept)|batch:Sample 0.000000 0.8880414

sd_(Intercept)|batch 2.203608 5.7998348

sigma 0.664149 1.0430984

(Intercept) 45.140652 49.4931109

Davies continues to estimate the ratio to residual for sampling variation, which was the best available for that time. This I won’t repeat.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Wiekvoet**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.