# Two sample Student’s t-test #2

July 25, 2009
By

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

Comparison of the averages of two independent groups, extracted from two populations at variance unknown; sample variances are not homogeneous.

We want to compare the heights in inches of two groups of individuals. Here the measurements:

A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188

As we have seen in a previous exercise, we must first check whether the variances are homogeneous (homoskedasticity) with a F-test of Fisher:

a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) b = c(120, 180, 125, 188, 130, 190, 110, 185, 112, 188)var.test(b,a)    F test to compare two variancesdata: b and aF = 14.6431, num df = 9, denom df = 9, p-value = 0.0004636alternative hypothesis: true ratio of variances is not equal to 195 percent confidence interval:   3.637133 58.952936sample estimates: ratio of variances   14.64308

We obtained p-value less than 0.05, then the two variances are not homogeneous. Indeed we can compare the value of F computed with the tabulated value of F for alpha = 0.05, degrees of freedom at numerator = 9, and degrees of freedom of denominator = 9, using the function qf(p, df.num, df.den):

qf(0.95, 9, 9)[1] 3.178893

F-computed is greater than F-tabulated, so we can reject the null hypothesis H0 of homogeneity of variances.

To make the comparison between the two groups, we use the function t.test with not homogeneous variances (var.equal = FALSE, which can also be omitted, because the function works on non-homogeneous variance by default) and independent samples (paired = FALSE, which can also be omitted, because by default the function works on independent samples) in this way:

t.test(a,b, var.equal=FALSE, paired=FALSE)    Welch Two Sample t-testdata: a and bt = 1.8827, df = 10.224, p-value = 0.08848alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:    -3.95955 47.95955sample estimates:mean of x mean of y    174.8     152.8

As we see in the headline, you made a t-test on two samples with the calculation of degrees of freedom using the formula of Welch-Satterthwaite (the result of the formula is df = 10,224), which is used in cases where the variances are not homogeneous. Welch-Satterthwaite equation is also called Dixon-Massey formula when you make the comparison between two groups, as in this case.
We obtained p-value greater than 0.05, then we can conclude that the means of the two groups are significantly similar (albeit p-value is very close to the threshold 0.05). Indeed the value of t is less than the tabulated t-value for 10,224 degrees of freedom, which in R we can calculate:

qt(0.975, 10.224)[1] 2.221539

We can accept the hypothesis H0 of equality of means.

Welch-Satterthwaite formula:

$$df=\frac{\sum deviance(X)}{\sum df(X)}=\frac{\sum_{i=1}^{k} (\sum_{j=1}^{n} (X_i_j – \bar{X_i})^2}{\sum_{i=1}^{k}(n_i-1)}$$

Dixon-Massey formula:

$$df=\frac{\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}+\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\frac{\displaystyle\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}\right)^2}{\displaystyle n_1-1}+\frac{\displaystyle\left(\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\displaystyle n_2-1}}$$

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: , , , ,