(This article was first published on

Comparison of the averages of two independent groups, extracted from two populations at variance unknown; sample variances are not homogeneous.**Statistic on aiR**, and kindly contributed to R-bloggers)We want to compare the heights in inches of two groups of individuals. Here the measurements:

A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179

B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188

B: 120, 180, 125, 188, 130, 190, 110, 185, 112, 188

As we have seen in a previous exercise, we must first check whether the variances are homogeneous (homoskedasticity) with a

**F-test of Fisher**:

a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)

b = c(120, 180, 125, 188, 130, 190, 110, 185, 112, 188)

var.test(b,a)

F test to compare two variances

data: b and a

F = 14.6431, num df = 9, denom df = 9, p-value = 0.0004636

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

3.637133 58.952936

sample estimates:

ratio of variances

14.64308

We obtained p-value less than 0.05, then the two variances are not homogeneous. Indeed we can compare the value of F computed with the tabulated value of F for

*alpha = 0.05*,

*degrees of freedom at numerator = 9*, and

*degrees of freedom of denominator = 9*, using the function

`qf(p, df.num, df.den)`

:

qf(0.95, 9, 9)

[1] 3.178893

F-computed is greater than F-tabulated, so we can reject the

**null hypothesis H0 of homogeneity of variances**.

To make the comparison between the two groups, we use the function

`t.test`

with not homogeneous variances (`var.equal = FALSE`

, which can also be omitted, because the function works on non-homogeneous variance by default) and independent samples (`paired = FALSE`

, which can also be omitted, because by default the function works on independent samples) in this way:

t.test(a,b, var.equal=FALSE, paired=FALSE)

Welch Two Sample t-test

data: a and b

t = 1.8827, df = 10.224, p-value = 0.08848

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-3.95955 47.95955

sample estimates:

mean of x mean of y

174.8 152.8

As we see in the headline, you made a t-test on two samples with the calculation of degrees of freedom using the

**formula of Welch-Satterthwaite**(the result of the formula is

*df = 10,224*), which is used in cases where the variances are not homogeneous. Welch-Satterthwaite equation is also called

**Dixon-Massey formula**when you make the comparison between two groups, as in this case.

We obtained p-value greater than 0.05, then we can conclude that the means of the two groups are significantly similar (albeit p-value is very close to the threshold 0.05). Indeed the value of t is less than the tabulated t-value for 10,224 degrees of freedom, which in R we can calculate:

qt(0.975, 10.224)

[1] 2.221539

We can accept the hypothesis H0 of equality of means.

**Welch-Satterthwaite formula**:

$$df=\frac{\sum deviance(X)}{\sum df(X)}=\frac{\sum_{i=1}^{k} (\sum_{j=1}^{n} (X_i_j - \bar{X_i})^2}{\sum_{i=1}^{k}(n_i-1)}$$

**Dixon-Massey formula**:

$$df=\frac{\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}+\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\frac{\displaystyle\left(\frac{\displaystyle S_1^2}{\displaystyle n_1}\right)^2}{\displaystyle n_1-1}+\frac{\displaystyle\left(\frac{\displaystyle S_2^2}{\displaystyle n_2}\right)^2}{\displaystyle n_2-1}}$$

To

**leave a comment**for the author, please follow the link and comment on his blog:**Statistic on aiR**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...