Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

F-test is used to assess whether the variances of two populations (A and B) are equal.

Contents

## When to you use F-test?

Comparing two variances is useful in several cases, including:

• When you want to perform a two samples t-test to check the equality of the variances of the two samples

• When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?

## Research questions and statistical hypotheses

Typical research questions are:

1. whether the variance of group A ($$\sigma^2_A$$) is equal to the variance of group B ($$\sigma^2_B$$)?
2. whether the variance of group A ($$\sigma^2_A$$) is less than the variance of group B ($$\sigma^2_B$$)?
3. whether the variance of group A ($$\sigma^2_A$$) is greather than the variance of group B ($$\sigma^2_B$$)?

In statistics, we can define the corresponding null hypothesis ($$H_0$$) as follow:

1. $$H_0: \sigma^2_A = \sigma^2_B$$
2. $$H_0: \sigma^2_A \leq \sigma^2_B$$
3. $$H_0: \sigma^2_A \geq \sigma^2_B$$

The corresponding alternative hypotheses ($$H_a$$) are as follow:

1. $$H_a: \sigma^2_A \ne \sigma^2_B$$ (different)
2. $$H_a: \sigma^2_A > \sigma^2_B$$ (greater)
3. $$H_a: \sigma^2_A < \sigma^2_B$$ (less)

Note that:

• Hypotheses 1) are called two-tailed tests
• Hypotheses 2) and 3) are called one-tailed tests

## Formula of F-test

The test statistic can be obtained by computing the ratio of the two variances $$S_A^2$$ and $$S_B^2$$.

$F = \frac{S_A^2}{S_B^2}$

The degrees of freedom are $$n_A – 1$$ (for the numerator) and $$n_B – 1$$ (for the denominator).

Note that, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.

Note that, the F-test requires the two samples to be normally distributed.

## Compute F-test in R

### R function

The R function var.test() can be used to compare two variances as follow:

# Method 1
var.test(values ~ groups, data,
alternative = "two.sided")
# or Method 2
var.test(x, y, alternative = "two.sided")

• x,y: numeric vectors
• alternative: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.

### Import and check your data into R

To import your data, use the following R code:

# If .txt tab file, use this
# Or, if .csv file, use this
my_data <- read.csv(file.choose())

Here, we’ll use the built-in R data set named ToothGrowth:

# Store the data in the variable my_data
my_data <- ToothGrowth

To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function sample_n()[in dplyr package]:

library("dplyr")
sample_n(my_data, 10)
len supp dose
43 23.6   OJ  1.0
28 21.5   VC  2.0
25 26.4   VC  2.0
56 30.9   OJ  2.0
46 25.2   OJ  1.0
7  11.2   VC  0.5
16 17.3   VC  1.0
4   5.8   VC  0.5
48 21.2   OJ  1.0
37  8.2   OJ  0.5

We want to test the equality of variances between the two groups OJ and VC in the column “supp”.

### Preleminary test to check F-test assumptions

F-test is very sensitive to departure from the normal assumption. You need to check whether the data is normally distributed before using the F-test.

Shapiro-Wilk test can be used to test whether the normal assumption holds. It’s also possible to use Q-Q plot (quantile-quantile plot) to graphically evaluate the normality of a variable. Q-Q plot draws the correlation between a given sample and the normal distribution.

If there is doubt about normality, the better choice is to use Levene’s test or Fligner-Killeen test, which are less sensitive to departure from normal assumption.

### Compute F-test

# F-test
res.ftest <- var.test(len ~ supp, data = my_data)
res.ftest

F test to compare two variances
data:  len by supp
F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3039488 1.3416857
sample estimates:
ratio of variances
0.6385951 

### Interpretation of the result

The p-value of F-test is p = 0.2331433 which is greater than the significance level 0.05. In conclusion, there is no significant difference between the two variances.

The function var.test() returns a list containing the following components:

• statistic: the value of the F test statistic.
• parameter: the degrees of the freedom of the F distribution of the test statistic.
• p.value: the p-value of the test.
• conf.int: a confidence interval for the ratio of the population variances.
• estimate: the ratio of the sample variances

The format of the R code to use for getting these values is as follow:

# ratio of variances
res.ftest$estimate ratio of variances 0.6385951 # p-value of the test res.ftest$p.value
[1] 0.2331433

## Infos

This analysis has been performed using R software (ver. 3.3.2).