Fisher’s exact test in R: independence test for a small sample
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Introduction
After presenting the Chisquare test of independence by hand and in R, this article focuses on the Fisher’s exact test.
Independence tests are used to determine if there is a significant relationship between two categorical variables. There exists two different types of independence test:
 the Chisquare test (the most common)
 the Fisher’s exact test
On the one hand, the Chisquare test is used when the sample is large enough (in this case the \(p\)value is an approximation that becomes exact when the sample becomes infinite, which is the case for many statistical tests). On the other hand, the Fisher’s exact test is used when the sample is small (and in this case the \(p\)value is exact and is not an approximation).
The literature indicates that the usual rule for deciding whether the \(\chi^2\) approximation is good enough is that the Chisquare test is not appropriate when the expected values in one of the cells of the contingency table is less than 5, and in this case the Fisher’s exact test is preferred (McCrumGardner 2008; Bower 2003).
Hypotheses
The hypotheses of the Fisher’s exact test are the same than for the Chisquare test, that is:
 \(H_0\) : the variables are independent, there is no relationship between the two categorical variables. Knowing the value of one variable does not help to predict the value of the other variable
 \(H_1\) : the variables are dependent, there is a relationship between the two categorical variables. Knowing the value of one variable helps to predict the value of the other variable
Example
Data
For our example, we want to determine whether there is a statistically significant association between smoking and being a professional athlete. Smoking can only be “yes” or “no” and being a professional athlete can only be “yes” or “no”. The two variables of interest are qualitative variables and we collected data on 14 persons.^{1}
Observed frequencies
Our data are summarized in the contingency table below reporting the number of people in each subgroup:
Nonsmoker  Smoker  

Athlete  7  2 
Nonathlete  0  5 
Expected frequencies
Remember that the Fisher’s exact test is used when there is at least one cell in the contingency table of the expected frequencies below 5. To retrieve the expected frequencies, use the chisq.test()
function together with $expected
:
chisq.test(dat)$expected ## Warning in chisq.test(dat): Chisquared approximation may be incorrect ## Nonsmoker Smoker ## Athlete 4.5 4.5 ## Nonathlete 2.5 2.5
The contingency table above confirms that we should use the Fisher’s exact test instead of the Chisquare test because there is at least one cell below 5.
Tip: although it is a good practice to check the expected frequencies before deciding between the Chisquare and the Fisher test, it is not a big issue if you forget. As you can see above, when doing the Chisquare test in R (with chisq.test()
), a warning such as “Chisquared approximation may be incorrect” will appear. This warning means that the smallest expected frequencies is lower than 5. Therefore, do not worry if you forgot to check the expected frequencies before applying the appropriate test to your data, R will warn you that you should use the Fisher’s exact test instead of the Chisquare test if that is the case.
Fisher’s exact test in R
To perform the Fisher’s exact test in R, use the fisher.test()
function as you would do for the Chisquare test:^{2}
test < fisher.test(dat) test ## ## Fisher's Exact Test for Count Data ## ## data: dat ## pvalue = 0.02098 ## alternative hypothesis: true odds ratio is not equal to 1 ## 95 percent confidence interval: ## 1.449481 Inf ## sample estimates: ## odds ratio ## Inf
The most important in the output is the \(p\)value. You can also retrieve the \(p\)value with:
test$p.value ## [1] 0.02097902
Conclusion and interpretation
From the output and from test$p.value
we see that the \(p\)value is less than the significance level of 5%. Like any other statistical test, if the \(p\)value is less than the significance level, we can reject the null hypothesis.
\(\Rightarrow\) In our context, rejecting the null hypothesis for the Fisher’s exact test of independence means that there is a significant relationship between the two categorical variables (smoking habits and being an athlete or not). Therefore, knowing the value of one variable helps to predict the value of the other variable.
Thanks for reading. I hope the article helped you to perform the Fisher’s exact test of independence in R and interpret its results. Learn more about the Chisquare test of independence by hand or in R. As always, if you find a mistake/bug or if you have any questions do not hesitate to let me know in the comment section below, raise an issue on GitHub or contact me. Get updates every time a new article is published by subscribing to this blog.
References
Bower, Keith M. 2003. “When to Use Fisher?S Exact Test.” In American Society for Quality, Six Sigma Forum Magazine, 2:35–37. 4.
McCrumGardner, Evie. 2008. “Which Is the Correct Statistical Test to Use?” British Journal of Oral and Maxillofacial Surgery 46 (1). Elsevier: 38–41.

The data are the same than for the article covering the Chisquare test by hand, except that some observations have been removed to decrease the sample size.↩

Use
fisher.test(table(dat$variable1, dat$variable2))
ifdat
represents the raw data and is not already presented as a contingency table.↩
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.