FTest: Compare Two Variances in R
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Contents
When to you use Ftest?
Comparing two variances is useful in several cases, including:

When you want to perform a two samples ttest to check the equality of the variances of the two samples

When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?
Research questions and statistical hypotheses
Typical research questions are:
 whether the variance of group A (\(\sigma^2_A\)) is equal to the variance of group B (\(\sigma^2_B\))?
 whether the variance of group A (\(\sigma^2_A\)) is less than the variance of group B (\(\sigma^2_B\))?
 whether the variance of group A (\(\sigma^2_A\)) is greather than the variance of group B (\(\sigma^2_B\))?
In statistics, we can define the corresponding null hypothesis (\(H_0\)) as follow:
 \(H_0: \sigma^2_A = \sigma^2_B\)
 \(H_0: \sigma^2_A \leq \sigma^2_B\)
 \(H_0: \sigma^2_A \geq \sigma^2_B\)
The corresponding alternative hypotheses (\(H_a\)) are as follow:
 \(H_a: \sigma^2_A \ne \sigma^2_B\) (different)
 \(H_a: \sigma^2_A > \sigma^2_B\) (greater)
 \(H_a: \sigma^2_A < \sigma^2_B\) (less)
Note that:
 Hypotheses 1) are called twotailed tests
 Hypotheses 2) and 3) are called onetailed tests
Formula of Ftest
The test statistic can be obtained by computing the ratio of the two variances \(S_A^2\) and \(S_B^2\).
\[F = \frac{S_A^2}{S_B^2}\]
The degrees of freedom are \(n_A – 1\) (for the numerator) and \(n_B – 1\) (for the denominator).
Note that, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.
Note that, the Ftest requires the two samples to be normally distributed.
Compute Ftest in R
R function
The R function var.test() can be used to compare two variances as follow:
# Method 1 var.test(values ~ groups, data, alternative = "two.sided") # or Method 2 var.test(x, y, alternative = "two.sided")
 x,y: numeric vectors
 alternative: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.
Import and check your data into R
To import your data, use the following R code:
# If .txt tab file, use this my_data < read.delim(file.choose()) # Or, if .csv file, use this my_data < read.csv(file.choose())
Here, we’ll use the builtin R data set named ToothGrowth:
# Store the data in the variable my_data my_data < ToothGrowth
To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function sample_n()[in dplyr package]:
library("dplyr") sample_n(my_data, 10)
len supp dose 43 23.6 OJ 1.0 28 21.5 VC 2.0 25 26.4 VC 2.0 56 30.9 OJ 2.0 46 25.2 OJ 1.0 7 11.2 VC 0.5 16 17.3 VC 1.0 4 5.8 VC 0.5 48 21.2 OJ 1.0 37 8.2 OJ 0.5
We want to test the equality of variances between the two groups OJ and VC in the column “supp”.
Preleminary test to check Ftest assumptions
Ftest is very sensitive to departure from the normal assumption. You need to check whether the data is normally distributed before using the Ftest.
ShapiroWilk test can be used to test whether the normal assumption holds. It’s also possible to use QQ plot (quantilequantile plot) to graphically evaluate the normality of a variable. QQ plot draws the correlation between a given sample and the normal distribution.
If there is doubt about normality, the better choice is to use Levene’s test or FlignerKilleen test, which are less sensitive to departure from normal assumption.
Compute Ftest
# Ftest res.ftest < var.test(len ~ supp, data = my_data) res.ftest
F test to compare two variances data: len by supp F = 0.6386, num df = 29, denom df = 29, pvalue = 0.2331 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.3039488 1.3416857 sample estimates: ratio of variances 0.6385951
Interpretation of the result
Access to the values returned by var.test() function
The function var.test() returns a list containing the following components:
 statistic: the value of the F test statistic.
 parameter: the degrees of the freedom of the F distribution of the test statistic.
 p.value: the pvalue of the test.
 conf.int: a confidence interval for the ratio of the population variances.
 estimate: the ratio of the sample variances
The format of the R code to use for getting these values is as follow:
# ratio of variances res.ftest$estimate
ratio of variances 0.6385951
# pvalue of the test res.ftest$p.value
[1] 0.2331433
Infos
This analysis has been performed using R software (ver. 3.3.2).
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.