Experimental Design: Problem Set
[This article was first published on ALSTAT R Blog, and kindly contributed to Rbloggers]. (You can report issue about the content on this page here)
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
QUESTIONS
 The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected:
Mixing
Techniques

Tensile Strength (lb/in_{}^{2})


1

3129

3000

2865

2890

2

3200

3300

2975

3150

3

2800

2900

3985

3050

4

2600

2700

2600

2765

 Test the hypothesis that mixing techniques affect the strength of the cement. Use $\alpha=0.05$.
 Construct a graphical display as described in Section 35.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions?
 Use the Fisher LSD method with $\alpha=0.05$ to make comparisons between pairs of means.
 Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
 Plot the residuals versus the predicted tensile strength. Comment on the plot.
 Prepare a scatter plot of the results to aid the interpretation of the results of this experiment
2.
 Rework part (b) of Problem 31 using Duncan’s multiple range test with $\alpha=0.05$. Does this make any difference in your conclusions?
 Rework part (b) of Problem 31 using Tukey’s test with $\alpha=0.05$. Do you get the same conclusions from Tukey’s test that you did from the graphical procedure and/or Duncan’s multiple range test?
COMPUTATIONAL AND GRAPHICAL SECTION
 The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected:
Mixing Techniques

Tensile Strength (lb/in_{}^{2})

Totals $(y_{i})$

Averages $(\bar{y}_{i})$


1

3129

3000

2865

2890

11884

2971


2

3200

3300

2975

3150

12625

3156.25


3

2800

2900

2985

3050

11735

2933.75


4

2600

2700

2600

2765

10665

2666.25



$y_{..}$=46909

$\bar{y}_{..}$=2931.81

 Test the hypothesis that mixing techniques affect the strength of the cement. Use $\alpha=0.05$.
I. Hypotheses:
H_{0}: $\mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}$
H_{1}: some means are different.
II. Level of significance: $\alpha = 0.05$
III. Test Statistics: $$F_{0}=\frac{\frac{SS_{Treatments}}{a1}}{\frac{SS_{E}}{Na}}=\frac{MS_{Treatments}}{MS_{E}}$$
IV. Rejection Region:
$$F_{0}>F_{\alpha,a1,Na}\\F_{0}>F_{0.05,3,12}\\F_{0}>3.49$$
$$F_{0}>F_{\alpha,a1,Na}\\F_{0}>F_{0.05,3,12}\\F_{0}>3.49$$
V. Computation:
$$SS_{T}=\sum_{i=1}^5\sum_{j=1}^5y_{ij}^2\frac{y_{..}^2}{N}\\=(3129)^2+(3000)^2+\dots+(2600)^2+(2765)^2\frac{(46909)^2}{16}\\=138172041\frac{(46909)^2}{16}=643648.4375\\SS_{Treatments}=\frac{1}{n}\sum_{i=1}^5y_{i.}^2\frac{y_{..}^2}{N}\\\frac{1}{4}[(11884)^2+\dots+(10665)^2]\frac{(46909)^2}{16}=489740.1875\\SS_{E}=SS_{T}SS_{Treatments}\\=643648.4375489740.1875=153908.25$$
$$SS_{T}=\sum_{i=1}^5\sum_{j=1}^5y_{ij}^2\frac{y_{..}^2}{N}\\=(3129)^2+(3000)^2+\dots+(2600)^2+(2765)^2\frac{(46909)^2}{16}\\=138172041\frac{(46909)^2}{16}=643648.4375\\SS_{Treatments}=\frac{1}{n}\sum_{i=1}^5y_{i.}^2\frac{y_{..}^2}{N}\\\frac{1}{4}[(11884)^2+\dots+(10665)^2]\frac{(46909)^2}{16}=489740.1875\\SS_{E}=SS_{T}SS_{Treatments}\\=643648.4375489740.1875=153908.25$$
ANOVA Table


Source

Sum
of Squares

Degrees
of Freedom

Mean
Square

F_{0}

PValue

Model

489740.19

3

163246.73

12.73

0.0005

Error

153908.25

12

12825.69



Total

643648.44

15

The Fvalue of 12.73 implies that the model is significant, since it is greater than the tabulated value, 3.49. And the pvalue of it is also less than the level of significance. Thus, will lead to the rejection of the null hypothesis and conclude that the mean techniques affect the strength of the cement significantly.
 Construct a graphical display as described in Section 35.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions?
Dashed line in the plot by color: Red – $\bar{y}_{4}$ Mean of Treatment 4 (2666.25)
Pink – $\bar{y}_{..}$ Grand Mean (2931.81)
Brown – $\bar{y}_{3}$ Mean of Treatment 3 (2933.75)
Green – $\bar{y}_{1}$ Mean of Treatment 1 (2971.00)
Blue – $\bar{y}_{2}$ Mean of Treatment 2 (3156.25)
Based on the plot and from the data also, we would conclude that $\bar{y}_{1}$ and $\bar{y}_{3}$ are the same, refer also to plot of question 1, the sixth one. Morever, the $\bar{y}_{4}$ differs from that of $\bar{y}_{1}$ and $\bar{y}_{3}$, and that $\bar{y}_{2}$ differs from $\bar{y}_{1}$ and $\bar{y}_{3}$, and that $\bar{y}_{2}$ and $\bar{y}_{4}$ are different.
How did I do it?
First thing we need to do is to make a student t distribution with degrees of freedom $N1=15$. After having that plot, we need to insert the four means of the treatment and locate it in the xvalues. Now, since the mean values are not seen on the plot because it’s too large, we then convert it first to tvalues, using the following formula,$$t=\frac{\bar{y}_{i}\bar{y}_{..}}{\frac{\sigma}{\sqrt{n}}}$$
 Use the Fisher LSD method with $\alpha = 0.05$ to make comparisons between pairs of means.$$LSD=t_{\frac{\alpha}{2},Na}\sqrt{\frac{2MS_{E}}{n}}=t_{0.025,164}\sqrt{\frac{2(12825.7)}{4}}=2.179\sqrt{6412.85}=174.495$$
Thus, any pair of treatment averages that differ in absolute value by more than 174.495 would imply that the corresponding pair of population means are significantly different.
The differences in averages are$$\bar{y}_{1.}\bar{y}_{2.}=2971.003156.25=185.25>174.495*\\\bar{y}_{1.}\bar{y}_{3.}=2971.002933.75=37.25<174.495\\\bar{y}_{1.}\bar{y}_{4.}=2971.002933.75=304.75>174.495*\\\bar{y}_{2.}\bar{y}_{3.}=3156.252933.75=222.25>174.495*\\\bar{y}_{2.}\bar{y}_{4.}=3156.252666.25=490.00>174.495*\\\bar{y}_{3.}\bar{y}_{4.}=2933.752666.25=267.5>174.495*$$
The starred values indicate pairs of means that are significantly different.
 Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
Nothing is unusual in the plot. Thus, the residuals met the normality assumption since the points fluctuate within the 95 percent confidence interval.
 Plot the residuals versus the predicted tensile strength. Comment on the plot.
The plot exhibits a little outwardopening funnel or megaphone, though not too obvious but still affect the nonconstancy of the error variance.
 Prepare a scatter plot of the results to aid the interpretation of the results of this experiment.
2. Rework part (b) of Problem 31 using Duncan’s multiple range test with . Does this make any difference in your conclusions?
Ranking the treatment averages in ascending order, we have$$\bar{y}_{4.}=2666.25\\\bar{y}_{3.}=2933.75\\\bar{y}_{1.}=2971.00\\\bar{y}_{2.}=3156.25$$
The standard error of each average is $S_{\bar{y}_{i}}=\sqrt{\frac{12825.69}{4}}=56.625$. From the table of significant ranges for 12 degrees of freedom and $\alpha=0.05$, we obtain $r_{0.05}(2,12)=3.081,r_{0.05}(3,12)=3.225,$ and $r_{0.05}(4,12)=3.312$. Thus, the least significant ranges are$$R_{2}=r_{0.05}(2,20)S_{\bar{y}_{i.}}=(3.081)(56.625)=174.46\\R_{3}=r_{0.05}(3,12)S_{\bar{y}_{i.}}=(3.312)(56.625)=182.62\\R_{4}=r_{0.05}(4,12)S_{\bar{y}_{i.}}=(3.312)(56.625)=187.54$$
The comparison would yield$$2 vs. 4: 3156.252666.25=490>187.54(R_{4})\\2 vs. 3: 3156.252933.75=222.5>182.62(R_{3})\\2 vs. 1: 3156.252971.00=185.25>174.46(R_{2})\\1 vs. 4: 2971.002666.25=304.75>182.62(R_{3})\\1 vs. 3: 2971.002933.75=37.25<174.46(R_{2})\\3 vs. 4: 2933.752666.25=267.5>174.46(R_{2})$$
From the analysis we observed that there are significant differences between all pairs of means except 1 and 3. This makes no difference in the previous conclusion of LSD method, which confirms that the Duncan’s multiple range test and the LSD method produce identical conclusions.
 Rework part (b) of Problem 31 using Tukey’s test with $\alpha=0.05$. Do you get the same conclusions from Tukey’s test that you did from the graphical procedure and/or Duncan’s multiple range test?$$T_{0.05}=q_{0.05}(4,12)\sqrt{\frac{MS_{E}}{n}}=4.20\sqrt{\frac{12825.69}{4}}=4.20(56.625)=237.825$$
Thus, any pair of treatment averages that differ in absolute value by more than 237.825 would imply that the corresponding pair of population means are significantly different. The four treatment averages are,$$\bar{y}_{1.}=2971.00~~~~~\bar{y}_{2.}=3156.25~~~~~\bar{y}_{3.}=2933.75~~~~~\bar{y}_{4.}=2666.25$$ And the differences in averages are$$\bar{y}_{1.}\bar{y}_{2.}=2971.003156.25=185.25\\\bar{y}_{1.}\bar{y}_{3.}=2971.002933.75=37.25\\\bar{y}_{1.}\bar{y}_{4.}=2971.002666.25=304.75*\\\bar{y}_{2.}\bar{y}_{3.}=3156.252933.75=222.5\\\bar{y}_{2.}\bar{y}_{4.}=3156.252666.25=490*\\\bar{y}_{3.}\bar{y}_{4.}=2933.752666.75=267.5*$$ The starred values indicate pairs of means that are significantly different.
The conclusions are not the same. The mean of Treatment 4 is different than the mean of Treatments 1, 2, and 3 in Duncans. However, the mean of Treatment 1 and mean of Treatment 2 is not different in Tukey computation as well as the mean of Treatment 1 and mean of Treatment 3. They were found to be different using the graphical method and the Fisher LSD method.
The conclusions are not the same. The mean of Treatment 4 is different than the mean of Treatments 1, 2, and 3 in Duncans. However, the mean of Treatment 1 and mean of Treatment 2 is not different in Tukey computation as well as the mean of Treatment 1 and mean of Treatment 3. They were found to be different using the graphical method and the Fisher LSD method.
Reference:
Design and Analysis of Experiments by Douglas C. Montgomery
R CODES SECTION
To leave a comment for the author, please follow the link and comment on their blog: ALSTAT R Blog.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.