# F-Test: Compare Two Variances in R

**Easy Guides**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**F-test**is used to assess whether the

**variances**of two populations (A and B) are equal.

**Contents**

## When to you use F-test?

Comparing two variances is useful in several cases, including:

When you want to perform a two samples t-test to check the equality of the variances of the two samples

When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?

## Research questions and statistical hypotheses

Typical research questions are:

- whether the variance of group A (\(\sigma^2_A\))
*is equal*to the variance of group B (\(\sigma^2_B\))? - whether the variance of group A (\(\sigma^2_A\))
*is less than*the variance of group B (\(\sigma^2_B\))? - whether the variance of group A (\(\sigma^2_A\))
*is greather than*the variance of group B (\(\sigma^2_B\))?

In statistics, we can define the corresponding *null hypothesis* (\(H_0\)) as follow:

- \(H_0: \sigma^2_A = \sigma^2_B\)
- \(H_0: \sigma^2_A \leq \sigma^2_B\)
- \(H_0: \sigma^2_A \geq \sigma^2_B\)

The corresponding *alternative hypotheses* (\(H_a\)) are as follow:

- \(H_a: \sigma^2_A \ne \sigma^2_B\) (different)
- \(H_a: \sigma^2_A > \sigma^2_B\) (greater)
- \(H_a: \sigma^2_A < \sigma^2_B\) (less)

Note that:

- Hypotheses 1) are called
**two-tailed tests** - Hypotheses 2) and 3) are called
**one-tailed tests**

## Formula of F-test

The test statistic can be obtained by computing the ratio of the two variances \(S_A^2\) and \(S_B^2\).

\[F = \frac{S_A^2}{S_B^2}\]

The degrees of freedom are \(n_A – 1\) (for the numerator) and \(n_B – 1\) (for the denominator).

Note that, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.

Note that, the F-test requires the two samples to be normally distributed.

## Compute F-test in R

### R function

The R function **var.test**() can be used to compare two variances as follow:

```
# Method 1
var.test(values ~ groups, data,
alternative = "two.sided")
# or Method 2
var.test(x, y, alternative = "two.sided")
```

**x,y**: numeric vectors**alternative**: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.

### Import and check your data into R

To import your data, use the following R code:

```
# If .txt tab file, use this
my_data <- read.delim(file.choose())
# Or, if .csv file, use this
my_data <- read.csv(file.choose())
```

Here, we’ll use the built-in R data set named ToothGrowth:

```
# Store the data in the variable my_data
my_data <- ToothGrowth
```

To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function **sample_n**()[in **dplyr** package]:

```
library("dplyr")
sample_n(my_data, 10)
```

```
len supp dose
43 23.6 OJ 1.0
28 21.5 VC 2.0
25 26.4 VC 2.0
56 30.9 OJ 2.0
46 25.2 OJ 1.0
7 11.2 VC 0.5
16 17.3 VC 1.0
4 5.8 VC 0.5
48 21.2 OJ 1.0
37 8.2 OJ 0.5
```

We want to test the equality of variances between the two groups OJ and VC in the column “supp”.

### Preleminary test to check F-test assumptions

F-test is very sensitive to departure from the normal assumption. You need to check whether the data is normally distributed before using the F-test.

Shapiro-Wilk test can be used to test whether the normal assumption holds. It’s also possible to use **Q-Q plot** (quantile-quantile plot) to graphically evaluate the normality of a variable. Q-Q plot draws the correlation between a given sample and the normal distribution.

If there is doubt about normality, the better choice is to use **Levene’s test** or **Fligner-Killeen test**, which are less sensitive to departure from normal assumption.

### Compute F-test

```
# F-test
res.ftest <- var.test(len ~ supp, data = my_data)
res.ftest
```

```
F test to compare two variances
data: len by supp
F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3039488 1.3416857
sample estimates:
ratio of variances
0.6385951
```

### Interpretation of the result

**F-test** is p = 0.2331433 which is greater than the significance level 0.05. In conclusion, there is no significant difference between the two variances.

### Access to the values returned by var.test() function

The function **var.test**() returns a list containing the following components:

**statistic**: the value of the F test statistic.**parameter**: the degrees of the freedom of the F distribution of the test statistic.**p.value**: the p-value of the test.**conf.int**: a confidence interval for the ratio of the population variances.**estimate**: the ratio of the sample variances

The format of the **R** code to use for getting these values is as follow:

```
# ratio of variances
res.ftest$estimate
```

```
ratio of variances
0.6385951
```

```
# p-value of the test
res.ftest$p.value
```

`[1] 0.2331433`

## Infos

This analysis has been performed using **R software** (ver. 3.3.2).

**leave a comment**for the author, please follow the link and comment on their blog:

**Easy Guides**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.