Two sample Student’s t-test #2

[This article was first published on Statistic on aiR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Comparison of the averages of two independent groups, extracted from two populations at variance unknown; sample variances are not homogeneous.

We want to compare the heights in inches of two groups of individuals. Here the measurements:

A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188

As we have seen in a previous exercise, we must first check whether the variances are homogeneous (homoskedasticity) with a F-test of Fisher:

a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
 b = c(120, 180, 125, 188, 130, 190, 110, 185, 112, 188)


    F test to compare two variances

data: b and a
F = 14.6431, num df = 9, denom df = 9, p-value = 0.0004636
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
   3.637133 58.952936
sample estimates:
 ratio of variances

We obtained p-value less than 0.05, then the two variances are not homogeneous. Indeed we can compare the value of F computed with the tabulated value of F for alpha = 0.05, degrees of freedom at numerator = 9, and degrees of freedom of denominator = 9, using the function qf(p, df.num, df.den):

qf(0.95, 9, 9)
[1] 3.178893

F-computed is greater than F-tabulated, so we can reject the null hypothesis H0 of homogeneity of variances.

To make the comparison between the two groups, we use the function t.test with not homogeneous variances (var.equal = FALSE, which can also be omitted, because the function works on non-homogeneous variance by default) and independent samples (paired = FALSE, which can also be omitted, because by default the function works on independent samples) in this way:

t.test(a,b, var.equal=FALSE, paired=FALSE)

    Welch Two Sample t-test

data: a and b
t = 1.8827, df = 10.224, p-value = 0.08848
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
    -3.95955 47.95955
sample estimates:
mean of x mean of y
    174.8     152.8

As we see in the headline, you made a t-test on two samples with the calculation of degrees of freedom using the formula of Welch-Satterthwaite (the result of the formula is df = 10,224), which is used in cases where the variances are not homogeneous. Welch-Satterthwaite equation is also called Dixon-Massey formula when you make the comparison between two groups, as in this case.
We obtained p-value greater than 0.05, then we can conclude that the means of the two groups are significantly similar (albeit p-value is very close to the threshold 0.05). Indeed the value of t is less than the tabulated t-value for 10,224 degrees of freedom, which in R we can calculate:

qt(0.975, 10.224)
[1] 2.221539

We can accept the hypothesis H0 of equality of means.

Welch-Satterthwaite formula:

$$df=\frac{\sum deviance(X)}{\sum df(X)}=\frac{\sum_{i=1}^{k} (\sum_{j=1}^{n} (X_i_j – \bar{X_i})^2}{\sum_{i=1}^{k}(n_i-1)}$$

Dixon-Massey formula:

$$df=\frac{\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}+\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\frac{\displaystyle\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}\right)^2}{\displaystyle n_1-1}+\frac{\displaystyle\left(\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\displaystyle n_2-1}}$$

To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)