Two sample Student’s t-test #2

July 25, 2009

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

Comparison of the averages of two independent groups, extracted from two populations at variance unknown; sample variances are not homogeneous.

We want to compare the heights in inches of two groups of individuals. Here the measurements:

A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188

As we have seen in a previous exercise, we must first check whether the variances are homogeneous (homoskedasticity) with a F-test of Fisher:

a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
b = c(120, 180, 125, 188, 130, 190, 110, 185, 112, 188)


F test to compare two variances

data: b and a
F = 14.6431, num df = 9, denom df = 9, p-value = 0.0004636
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
3.637133 58.952936
sample estimates:
ratio of variances

We obtained p-value less than 0.05, then the two variances are not homogeneous. Indeed we can compare the value of F computed with the tabulated value of F for alpha = 0.05, degrees of freedom at numerator = 9, and degrees of freedom of denominator = 9, using the function qf(p, df.num, df.den):

qf(0.95, 9, 9)
[1] 3.178893

F-computed is greater than F-tabulated, so we can reject the null hypothesis H0 of homogeneity of variances.

To make the comparison between the two groups, we use the function t.test with not homogeneous variances (var.equal = FALSE, which can also be omitted, because the function works on non-homogeneous variance by default) and independent samples (paired = FALSE, which can also be omitted, because by default the function works on independent samples) in this way:

t.test(a,b, var.equal=FALSE, paired=FALSE)

Welch Two Sample t-test

data: a and b
t = 1.8827, df = 10.224, p-value = 0.08848
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.95955 47.95955
sample estimates:
mean of x mean of y
174.8 152.8

As we see in the headline, you made a t-test on two samples with the calculation of degrees of freedom using the formula of Welch-Satterthwaite (the result of the formula is df = 10,224), which is used in cases where the variances are not homogeneous. Welch-Satterthwaite equation is also called Dixon-Massey formula when you make the comparison between two groups, as in this case.
We obtained p-value greater than 0.05, then we can conclude that the means of the two groups are significantly similar (albeit p-value is very close to the threshold 0.05). Indeed the value of t is less than the tabulated t-value for 10,224 degrees of freedom, which in R we can calculate:

qt(0.975, 10.224)
[1] 2.221539

We can accept the hypothesis H0 of equality of means.

Welch-Satterthwaite formula:

$$df=\frac{\sum deviance(X)}{\sum df(X)}=\frac{\sum_{i=1}^{k} (\sum_{j=1}^{n} (X_i_j – \bar{X_i})^2}{\sum_{i=1}^{k}(n_i-1)}$$

Dixon-Massey formula:

$$df=\frac{\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}+\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\frac{\displaystyle\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}\right)^2}{\displaystyle n_1-1}+\frac{\displaystyle\left(\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\displaystyle n_2-1}}$$

To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)