Experimental Design: Problem Set

[This article was first published on ALSTAT R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


QUESTIONS

  1. The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected:
Mixing
Techniques
Tensile Strength (lb/in­­2)
1
3129
3000
2865
2890
2
3200
3300
2975
3150
3
2800
2900
3985
3050
4
2600
2700
2600
2765
  • Test the hypothesis that mixing techniques affect the strength of the cement. Use $\alpha=0.05$.
  • Construct a graphical display as described in Section 3-5.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions?
  • Use the Fisher LSD method with $\alpha=0.05$ to make comparisons between pairs of means.
  • Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
  • Plot the residuals versus the predicted tensile strength. Comment on the plot.
  • Prepare a scatter plot of the results to aid the interpretation of the results of this experiment
       2.
    • Rework part (b) of Problem 3-1 using Duncan’s multiple range test with $\alpha=0.05$. Does this make any difference in your conclusions?
    • Rework part (b) of Problem 3-1 using Tukey’s test with $\alpha=0.05$. Do you get the same conclusions from Tukey’s test that you did from the graphical procedure and/or Duncan’s multiple range test?

      COMPUTATIONAL AND GRAPHICAL SECTION

      1. The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected: 
      Mixing Techniques
      Tensile Strength (lb/in­­2)
      Totals $(y_{i})$
      Averages $(\bar{y}_{i})$
      1
      3129
      3000
      2865
      2890
      11884
      2971
      2
      3200
      3300
      2975
      3150
      12625
      3156.25
      3
      2800
      2900
      2985
      3050
      11735
      2933.75
      4
      2600
      2700
      2600
      2765
      10665
      2666.25

      $y_{..}$=46909
      $\bar{y}_{..}$=2931.81

      • Test the hypothesis that mixing techniques affect the strength of the cement. Use $\alpha=0.05$.

              I.      Hypotheses:
      0: $\mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}$
      H1: some means are different.
      II.     Level of significance: $\alpha = 0.05$
      III.    Test Statistics: $$F_{0}=\frac{\frac{SS_{Treatments}}{a-1}}{\frac{SS_{E}}{N-a}}=\frac{MS_{Treatments}}{MS_{E}}$$
      IV.   Rejection Region:
      $$F_{0}>F_{\alpha,a-1,N-a}\\F_{0}>F_{0.05,3,12}\\F_{0}>3.49$$
      V.    Computation:
      $$SS_{T}=\sum_{i=1}^5\sum_{j=1}^5y_{ij}^2-\frac{y_{..}^2}{N}\\=(3129)^2+(3000)^2+\dots+(2600)^2+(2765)^2-\frac{(46909)^2}{16}\\=138172041-\frac{(46909)^2}{16}=643648.4375\\SS_{Treatments}=\frac{1}{n}\sum_{i=1}^5y_{i.}^2-\frac{y_{..}^2}{N}\\\frac{1}{4}[(11884)^2+\dots+(10665)^2]-\frac{(46909)^2}{16}=489740.1875\\SS_{E}=SS_{T}-SS_{Treatments}\\=643648.4375-489740.1875=153908.25$$
      ANOVA Table
      Source
      Sum 
      of Squares
      Degrees
      of Freedom
      Mean 
      Square
      F0
      P-Value
      Model
      489740.19
      3
      163246.73
      12.73
      0.0005
      Error
      153908.25
      12
      12825.69


      Total
      643648.44
      15




              The F-value of 12.73 implies that the model is significant, since it is greater than the tabulated value, 3.49. And the p-value of it is also less than the level of significance. Thus, will lead to the rejection of the null hypothesis and conclude that the mean techniques affect the strength of the cement significantly.
      • Construct a graphical display as described in Section 3-5.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions? 
              Dashed line in the plot by color:     Red – $\bar{y}_{4}$ Mean of Treatment 4 (2666.25)
                                                                              Pink – $\bar{y}_{..}$ Grand Mean (2931.81)
                                                                              Brown – $\bar{y}_{3}$ Mean of Treatment 3 (2933.75)
                                                                              Green – $\bar{y}_{1}$ Mean of Treatment 1 (2971.00)
                                                                              Blue – $\bar{y}_{2}$ Mean of Treatment 2 (3156.25)

              Based on the plot and from the data also, we would conclude that $\bar{y}_{1}$ and $\bar{y}_{3}$ are the same, refer also to plot of question 1, the sixth one. Morever, the $\bar{y}_{4}$ differs from that of $\bar{y}_{1}$ and $\bar{y}_{3}$, and that $\bar{y}_{2}$ differs from $\bar{y}_{1}$ and $\bar{y}_{3}$, and that $\bar{y}_{2}$ and $\bar{y}_{4}$ are different.
              
              How did I do it?
             First thing we need to do is to make a student t distribution with degrees of freedom $N-1=15$. After having that plot, we need to insert the four means of the treatment and locate it in the x-values. Now, since the mean values are not seen on the plot because it’s too large, we then convert it first to t-values, using the following formula,$$t=\frac{\bar{y}_{i}-\bar{y}_{..}}{\frac{\sigma}{\sqrt{n}}}$$
      • Use the Fisher LSD method with $\alpha = 0.05$ to make comparisons between pairs of means.$$LSD=t_{\frac{\alpha}{2},N-a}\sqrt{\frac{2MS_{E}}{n}}=t_{0.025,16-4}\sqrt{\frac{2(12825.7)}{4}}=2.179\sqrt{6412.85}=174.495$$
              Thus, any pair of treatment averages that differ in absolute value by more than 174.495 would imply that the corresponding pair of population means are significantly different.

              The differences in averages are$$\bar{y}_{1.}-\bar{y}_{2.}=2971.00-3156.25=-185.25>174.495*\\\bar{y}_{1.}-\bar{y}_{3.}=2971.00-2933.75=37.25<174.495\\\bar{y}_{1.}-\bar{y}_{4.}=2971.00-2933.75=304.75>174.495*\\\bar{y}_{2.}-\bar{y}_{3.}=3156.25-2933.75=222.25>174.495*\\\bar{y}_{2.}-\bar{y}_{4.}=3156.25-2666.25=490.00>174.495*\\\bar{y}_{3.}-\bar{y}_{4.}=2933.75-2666.25=267.5>174.495*$$
              The starred values indicate pairs of means that are significantly different.
      • Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
              Nothing is unusual in the plot. Thus, the residuals met the normality assumption since the points fluctuate within the 95 percent confidence interval.
      • Plot the residuals versus the predicted tensile strength. Comment on the plot.
              The plot exhibits a little outward-opening funnel or megaphone, though not too obvious but still affect the non-constancy of the error variance.
      • Prepare a scatter plot of the results to aid the interpretation of the results of this experiment.























      2. Rework part (b) of Problem 3-1 using Duncan’s multiple range test with . Does this make any difference in your conclusions?
           
              Ranking the treatment averages in ascending order, we have$$\bar{y}_{4.}=2666.25\\\bar{y}_{3.}=2933.75\\\bar{y}_{1.}=2971.00\\\bar{y}_{2.}=3156.25$$
              The standard error of each average is $S_{\bar{y}_{i}}=\sqrt{\frac{12825.69}{4}}=56.625$. From the table of significant ranges for 12 degrees of freedom and $\alpha=0.05$, we obtain $r_{0.05}(2,12)=3.081,r_{0.05}(3,12)=3.225,$ and $r_{0.05}(4,12)=3.312$. Thus, the least significant ranges are$$R_{2}=r_{0.05}(2,20)S_{\bar{y}_{i.}}=(3.081)(56.625)=174.46\\R_{3}=r_{0.05}(3,12)S_{\bar{y}_{i.}}=(3.312)(56.625)=182.62\\R_{4}=r_{0.05}(4,12)S_{\bar{y}_{i.}}=(3.312)(56.625)=187.54$$
             The comparison would yield$$2 vs. 4: 3156.25-2666.25=490>187.54(R_{4})\\2 vs. 3: 3156.25-2933.75=222.5>182.62(R_{3})\\2 vs. 1: 3156.25-2971.00=185.25>174.46(R_{2})\\1 vs. 4: 2971.00-2666.25=304.75>182.62(R_{3})\\1 vs. 3: 2971.00-2933.75=37.25<174.46(R_{2})\\3 vs. 4: 2933.75-2666.25=267.5>174.46(R_{2})$$
              From the analysis we observed that there are significant differences between all pairs of means except 1 and 3. This makes no difference in the previous conclusion of LSD method, which confirms that the Duncan’s multiple range test and the LSD method produce identical conclusions.
      • Rework part (b) of Problem 3-1 using Tukey’s test with $\alpha=0.05$. Do you get the same conclusions from Tukey’s test that you did from the graphical procedure and/or Duncan’s multiple range test?$$T_{0.05}=q_{0.05}(4,12)\sqrt{\frac{MS_{E}}{n}}=4.20\sqrt{\frac{12825.69}{4}}=4.20(56.625)=237.825$$
              Thus, any pair of treatment averages that differ in absolute value by more than 237.825 would imply that the corresponding pair of population means are significantly different. The four treatment averages are,$$\bar{y}_{1.}=2971.00~~~~~\bar{y}_{2.}=3156.25~~~~~\bar{y}_{3.}=2933.75~~~~~\bar{y}_{4.}=2666.25$$        And the differences in averages are$$\bar{y}_{1.}-\bar{y}_{2.}=2971.00-3156.25=-185.25\\\bar{y}_{1.}-\bar{y}_{3.}=2971.00-2933.75=37.25\\\bar{y}_{1.}-\bar{y}_{4.}=2971.00-2666.25=304.75*\\\bar{y}_{2.}-\bar{y}_{3.}=3156.25-2933.75=222.5\\\bar{y}_{2.}-\bar{y}_{4.}=3156.25-2666.25=490*\\\bar{y}_{3.}-\bar{y}_{4.}=2933.75-2666.75=267.5*$$        The starred values indicate pairs of means that are significantly different.

                The conclusions are not the same. The mean of Treatment 4 is different than the mean of Treatments 1, 2, and 3 in Duncans. However, the mean of Treatment 1 and mean of Treatment 2 is not different in Tukey computation as well as the mean of Treatment 1 and mean of Treatment 3. They were found to be different using the graphical method and the Fisher LSD method.



               Reference:
                          Design and Analysis of Experiments by Douglas C. Montgomery


              R CODES SECTION


      To leave a comment for the author, please follow the link and comment on their blog: ALSTAT R Blog.

      R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
      Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

      Never miss an update!
      Subscribe to R-bloggers to receive
      e-mails with the latest R posts.
      (You will not see this message again.)

      Click here to close (This popup will not appear again)