Two sample Student’s t-test #2

July 25, 2009
By

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

Comparison of the averages of two independent groups, extracted from two populations at variance unknown; sample variances are not homogeneous.

We want to compare the heights in inches of two groups of individuals. Here the measurements:

A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188


As we have seen in a previous exercise, we must first check whether the variances are homogeneous (homoskedasticity) with a F-test of Fisher:



a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
b = c(120, 180, 125, 188, 130, 190, 110, 185, 112, 188)

var.test(b,a)

F test to compare two variances

data: b and a
F = 14.6431, num df = 9, denom df = 9, p-value = 0.0004636
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
3.637133 58.952936
sample estimates:
ratio of variances
14.64308


We obtained p-value less than 0.05, then the two variances are not homogeneous. Indeed we can compare the value of F computed with the tabulated value of F for alpha = 0.05, degrees of freedom at numerator = 9, and degrees of freedom of denominator = 9, using the function qf(p, df.num, df.den):


qf(0.95, 9, 9)
[1] 3.178893


F-computed is greater than F-tabulated, so we can reject the null hypothesis H0 of homogeneity of variances.

To make the comparison between the two groups, we use the function t.test with not homogeneous variances (var.equal = FALSE, which can also be omitted, because the function works on non-homogeneous variance by default) and independent samples (paired = FALSE, which can also be omitted, because by default the function works on independent samples) in this way:


t.test(a,b, var.equal=FALSE, paired=FALSE)

Welch Two Sample t-test

data: a and b
t = 1.8827, df = 10.224, p-value = 0.08848
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.95955 47.95955
sample estimates:
mean of x mean of y
174.8 152.8


As we see in the headline, you made a t-test on two samples with the calculation of degrees of freedom using the formula of Welch-Satterthwaite (the result of the formula is df = 10,224), which is used in cases where the variances are not homogeneous. Welch-Satterthwaite equation is also called Dixon-Massey formula when you make the comparison between two groups, as in this case.
We obtained p-value greater than 0.05, then we can conclude that the means of the two groups are significantly similar (albeit p-value is very close to the threshold 0.05). Indeed the value of t is less than the tabulated t-value for 10,224 degrees of freedom, which in R we can calculate:


qt(0.975, 10.224)
[1] 2.221539


We can accept the hypothesis H0 of equality of means.


Welch-Satterthwaite formula:

$$df=\frac{\sum deviance(X)}{\sum df(X)}=\frac{\sum_{i=1}^{k} (\sum_{j=1}^{n} (X_i_j - \bar{X_i})^2}{\sum_{i=1}^{k}(n_i-1)}$$


Dixon-Massey formula:

$$df=\frac{\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}+\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\frac{\displaystyle\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}\right)^2}{\displaystyle n_1-1}+\frac{\displaystyle\left(\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\displaystyle n_2-1}}$$

To leave a comment for the author, please follow the link and comment on his blog: Statistic on aiR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.