Two sample Student’s t-test #2
[This article was first published on Statistic on aiR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Comparison of the averages of two independent groups, extracted from two populations at variance unknown; sample variances are not homogeneous.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We want to compare the heights in inches of two groups of individuals. Here the measurements:
A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188
B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188
As we have seen in a previous exercise, we must first check whether the variances are homogeneous (homoskedasticity) with a F-test of Fisher:
a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179) b = c(120, 180, 125, 188, 130, 190, 110, 185, 112, 188) var.test(b,a) F test to compare two variances data: b and a F = 14.6431, num df = 9, denom df = 9, p-value = 0.0004636 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 3.637133 58.952936 sample estimates: ratio of variances 14.64308
We obtained p-value less than 0.05, then the two variances are not homogeneous. Indeed we can compare the value of F computed with the tabulated value of F for alpha = 0.05, degrees of freedom at numerator = 9, and degrees of freedom of denominator = 9, using the function
qf(p, df.num, df.den)
:qf(0.95, 9, 9) [1] 3.178893
F-computed is greater than F-tabulated, so we can reject the null hypothesis H0 of homogeneity of variances.
To make the comparison between the two groups, we use the function
t.test
with not homogeneous variances (var.equal = FALSE
, which can also be omitted, because the function works on non-homogeneous variance by default) and independent samples (paired = FALSE
, which can also be omitted, because by default the function works on independent samples) in this way:t.test(a,b, var.equal=FALSE, paired=FALSE) Welch Two Sample t-test data: a and b t = 1.8827, df = 10.224, p-value = 0.08848 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.95955 47.95955 sample estimates: mean of x mean of y 174.8 152.8
As we see in the headline, you made a t-test on two samples with the calculation of degrees of freedom using the formula of Welch-Satterthwaite (the result of the formula is df = 10,224), which is used in cases where the variances are not homogeneous. Welch-Satterthwaite equation is also called Dixon-Massey formula when you make the comparison between two groups, as in this case.
We obtained p-value greater than 0.05, then we can conclude that the means of the two groups are significantly similar (albeit p-value is very close to the threshold 0.05). Indeed the value of t is less than the tabulated t-value for 10,224 degrees of freedom, which in R we can calculate:
qt(0.975, 10.224) [1] 2.221539
We can accept the hypothesis H0 of equality of means.
Welch-Satterthwaite formula:
$$df=\frac{\sum deviance(X)}{\sum df(X)}=\frac{\sum_{i=1}^{k} (\sum_{j=1}^{n} (X_i_j – \bar{X_i})^2}{\sum_{i=1}^{k}(n_i-1)}$$
Dixon-Massey formula:
$$df=\frac{\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}+\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\frac{\displaystyle\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}\right)^2}{\displaystyle n_1-1}+\frac{\displaystyle\left(\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\displaystyle n_2-1}}$$
To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.