(This article was first published on

**ALSTAT R Blog**, and kindly contributed to R-bloggers)#### QUESTIONS

- The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected:

Mixing Techniques | Tensile Strength (lb/in _{}^{2}) | |||

1 | 3129 | 3000 | 2865 | 2890 |

2 | 3200 | 3300 | 2975 | 3150 |

3 | 2800 | 2900 | 3985 | 3050 |

4 | 2600 | 2700 | 2600 | 2765 |

- Test the hypothesis that mixing techniques affect the strength of the cement. Use $\alpha=0.05$.
- Construct a graphical display as described in Section 3-5.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions?
- Use the Fisher LSD method with $\alpha=0.05$ to make comparisons between pairs of means.
- Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
- Plot the residuals versus the predicted tensile strength. Comment on the plot.
- Prepare a scatter plot of the results to aid the interpretation of the results of this experiment

- Rework part (b) of Problem 3-1 using Duncan’s multiple range test with $\alpha=0.05$. Does this make any difference in your conclusions?
- Rework part (b) of Problem 3-1 using Tukey’s test with $\alpha=0.05$. Do you get the same conclusions from Tukey’s test that you did from the graphical procedure and/or Duncan’s multiple range test?

#### **COMPUTATIONAL AND GRAPHICAL SECTION**

- The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected:

Mixing Techniques | Tensile Strength (lb/in _{}^{2}) | Totals $(y_{i})$ | Averages $(\bar{y}_{i})$ | |||||

1 | 3129 | 3000 | 2865 | 2890 | 11884 | 2971 | ||

2 | 3200 | 3300 | 2975 | 3150 | 12625 | 3156.25 | ||

3 | 2800 | 2900 | 2985 | 3050 | 11735 | 2933.75 | ||

4 | 2600 | 2700 | 2600 | 2765 | 10665 | 2666.25 | ||

$y_{..}$=46909 | $\bar{y}_{..}$=2931.81 |

- Test the hypothesis that mixing techniques affect the strength of the cement. Use $\alpha=0.05$.

I. Hypotheses:

H

_{0}: $\mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}$H

_{1}: some means are different.II. Level of significance: $\alpha = 0.05$

III. Test Statistics: $$F_{0}=\frac{\frac{SS_{Treatments}}{a-1}}{\frac{SS_{E}}{N-a}}=\frac{MS_{Treatments}}{MS_{E}}$$

IV. Rejection Region:

$$F_{0}>F_{\alpha,a-1,N-a}\\F_{0}>F_{0.05,3,12}\\F_{0}>3.49$$

$$F_{0}>F_{\alpha,a-1,N-a}\\F_{0}>F_{0.05,3,12}\\F_{0}>3.49$$

V. Computation:

$$SS_{T}=\sum_{i=1}^5\sum_{j=1}^5y_{ij}^2-\frac{y_{..}^2}{N}\\=(3129)^2+(3000)^2+\dots+(2600)^2+(2765)^2-\frac{(46909)^2}{16}\\=138172041-\frac{(46909)^2}{16}=643648.4375\\SS_{Treatments}=\frac{1}{n}\sum_{i=1}^5y_{i.}^2-\frac{y_{..}^2}{N}\\\frac{1}{4}[(11884)^2+\dots+(10665)^2]-\frac{(46909)^2}{16}=489740.1875\\SS_{E}=SS_{T}-SS_{Treatments}\\=643648.4375-489740.1875=153908.25$$

$$SS_{T}=\sum_{i=1}^5\sum_{j=1}^5y_{ij}^2-\frac{y_{..}^2}{N}\\=(3129)^2+(3000)^2+\dots+(2600)^2+(2765)^2-\frac{(46909)^2}{16}\\=138172041-\frac{(46909)^2}{16}=643648.4375\\SS_{Treatments}=\frac{1}{n}\sum_{i=1}^5y_{i.}^2-\frac{y_{..}^2}{N}\\\frac{1}{4}[(11884)^2+\dots+(10665)^2]-\frac{(46909)^2}{16}=489740.1875\\SS_{E}=SS_{T}-SS_{Treatments}\\=643648.4375-489740.1875=153908.25$$

ANOVA Table | |||||

Source | Sum of Squares | Degrees of Freedom | Mean Square | F _{0} | P-Value |

Model | 489740.19 | 3 | 163246.73 | 12.73 | 0.0005 |

Error | 153908.25 | 12 | 12825.69 | ||

Total | 643648.44 | 15 |

The F-value of 12.73 implies that the model is significant, since it is greater than the tabulated value, 3.49. And the p-value of it is also less than the level of significance. Thus, will lead to the rejection of the null hypothesis and conclude that the mean techniques affect the strength of the cement significantly.

- Construct a graphical display as described in Section 3-5.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions?

Dashed line in the plot by color: Red - $\bar{y}_{4}$ Mean of Treatment 4 (2666.25)

Pink - $\bar{y}_{..}$ Grand Mean (2931.81)

Brown - $\bar{y}_{3}$ Mean of Treatment 3 (2933.75)

Green - $\bar{y}_{1}$ Mean of Treatment 1 (2971.00)

Blue - $\bar{y}_{2}$ Mean of Treatment 2 (3156.25)

Based on the plot and from the data also, we would conclude that $\bar{y}_{1}$ and $\bar{y}_{3}$ are the same, refer also to plot of question 1, the sixth one. Morever, the $\bar{y}_{4}$ differs from that of $\bar{y}_{1}$ and $\bar{y}_{3}$, and that $\bar{y}_{2}$ differs from $\bar{y}_{1}$ and $\bar{y}_{3}$, and that $\bar{y}_{2}$ and $\bar{y}_{4}$ are different.

How did I do it?

First thing we need to do is to make a student t distribution with degrees of freedom $N-1=15$. After having that plot, we need to insert the four means of the treatment and locate it in the x-values. Now, since the mean values are not seen on the plot because it's too large, we then convert it first to t-values, using the following formula,$$t=\frac{\bar{y}_{i}-\bar{y}_{..}}{\frac{\sigma}{\sqrt{n}}}$$

- Use the Fisher LSD method with $\alpha = 0.05$ to make comparisons between pairs of means.$$LSD=t_{\frac{\alpha}{2},N-a}\sqrt{\frac{2MS_{E}}{n}}=t_{0.025,16-4}\sqrt{\frac{2(12825.7)}{4}}=2.179\sqrt{6412.85}=174.495$$

Thus, any pair of treatment averages that differ in absolute value by more than 174.495 would imply that the corresponding pair of population means are significantly different.

The differences in averages are$$\bar{y}_{1.}-\bar{y}_{2.}=2971.00-3156.25=-185.25>174.495*\\\bar{y}_{1.}-\bar{y}_{3.}=2971.00-2933.75=37.25<174.495\\\bar{y}_{1.}-\bar{y}_{4.}=2971.00-2933.75=304.75>174.495*\\\bar{y}_{2.}-\bar{y}_{3.}=3156.25-2933.75=222.25>174.495*\\\bar{y}_{2.}-\bar{y}_{4.}=3156.25-2666.25=490.00>174.495*\\\bar{y}_{3.}-\bar{y}_{4.}=2933.75-2666.25=267.5>174.495*$$

The starred values indicate pairs of means that are significantly different.

- Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?

Nothing is unusual in the plot. Thus, the residuals met the normality assumption since the points fluctuate within the 95 percent confidence interval.

- Plot the residuals versus the predicted tensile strength. Comment on the plot.

The plot exhibits a little outward-opening funnel or megaphone, though not too obvious but still affect the non-constancy of the error variance.

- Prepare a scatter plot of the results to aid the interpretation of the results of this experiment.

2. Rework part (b) of Problem 3-1 using Duncan’s multiple range test with . Does this make any difference in your conclusions?

Ranking the treatment averages in ascending order, we have$$\bar{y}_{4.}=2666.25\\\bar{y}_{3.}=2933.75\\\bar{y}_{1.}=2971.00\\\bar{y}_{2.}=3156.25$$

The standard error of each average is $S_{\bar{y}_{i}}=\sqrt{\frac{12825.69}{4}}=56.625$. From the table of significant ranges for 12 degrees of freedom and $\alpha=0.05$, we obtain $r_{0.05}(2,12)=3.081,r_{0.05}(3,12)=3.225,$ and $r_{0.05}(4,12)=3.312$. Thus, the least significant ranges are$$R_{2}=r_{0.05}(2,20)S_{\bar{y}_{i.}}=(3.081)(56.625)=174.46\\R_{3}=r_{0.05}(3,12)S_{\bar{y}_{i.}}=(3.312)(56.625)=182.62\\R_{4}=r_{0.05}(4,12)S_{\bar{y}_{i.}}=(3.312)(56.625)=187.54$$

The comparison would yield$$2 vs. 4: 3156.25-2666.25=490>187.54(R_{4})\\2 vs. 3: 3156.25-2933.75=222.5>182.62(R_{3})\\2 vs. 1: 3156.25-2971.00=185.25>174.46(R_{2})\\1 vs. 4: 2971.00-2666.25=304.75>182.62(R_{3})\\1 vs. 3: 2971.00-2933.75=37.25<174.46(R_{2})\\3 vs. 4: 2933.75-2666.25=267.5>174.46(R_{2})$$

From the analysis we observed that there are significant differences between all pairs of means except 1 and 3. This makes no difference in the previous conclusion of LSD method, which confirms that the Duncan’s multiple range test and the LSD method produce identical conclusions.

- Rework part (b) of Problem 3-1 using Tukey’s test with $\alpha=0.05$. Do you get the same conclusions from Tukey’s test that you did from the graphical procedure and/or Duncan’s multiple range test?$$T_{0.05}=q_{0.05}(4,12)\sqrt{\frac{MS_{E}}{n}}=4.20\sqrt{\frac{12825.69}{4}}=4.20(56.625)=237.825$$

Thus, any pair of treatment averages that differ in absolute value by more than 237.825 would imply that the corresponding pair of population means are significantly different. The four treatment averages are,$$\bar{y}_{1.}=2971.00~~~~~\bar{y}_{2.}=3156.25~~~~~\bar{y}_{3.}=2933.75~~~~~\bar{y}_{4.}=2666.25$$ And the differences in averages are$$\bar{y}_{1.}-\bar{y}_{2.}=2971.00-3156.25=-185.25\\\bar{y}_{1.}-\bar{y}_{3.}=2971.00-2933.75=37.25\\\bar{y}_{1.}-\bar{y}_{4.}=2971.00-2666.25=304.75*\\\bar{y}_{2.}-\bar{y}_{3.}=3156.25-2933.75=222.5\\\bar{y}_{2.}-\bar{y}_{4.}=3156.25-2666.25=490*\\\bar{y}_{3.}-\bar{y}_{4.}=2933.75-2666.75=267.5*$$ The starred values indicate pairs of means that are significantly different.

The conclusions are not the same. The mean of Treatment 4 is different than the mean of Treatments 1, 2, and 3 in Duncans. However, the mean of Treatment 1 and mean of Treatment 2 is not different in Tukey computation as well as the mean of Treatment 1 and mean of Treatment 3. They were found to be different using the graphical method and the Fisher LSD method.

The conclusions are not the same. The mean of Treatment 4 is different than the mean of Treatments 1, 2, and 3 in Duncans. However, the mean of Treatment 1 and mean of Treatment 2 is not different in Tukey computation as well as the mean of Treatment 1 and mean of Treatment 3. They were found to be different using the graphical method and the Fisher LSD method.

Reference:

*Design and Analysis of Experiments*by Douglas C. Montgomery

#### R CODES SECTION

To

**leave a comment**for the author, please follow the link and comment on his blog:**ALSTAT R Blog**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...