It’s not a shame to put a note on something (probably) everyone knows and you thought you know but actually you are not 100% sure. Multiple testing is such a piece in my knowledge map.
Some terms first:
– Type I error (false positive) and Type II error (false negative):
When we do a hypothesis test, we can categorize the result into the following 2×2 table:
|Table of error types||Null hypothesis (H0) is|
|Judgement of Null Hypothesis (H0)||Reject||Type I error
|Fail to reject||Correct inference
|Type II error
Type I error is “you reject a true thing”. If the true thing is a null hypothesis (H0), which is what people usually assume (e.g. no difference, no effect), then you reject it (or yes, there is difference), it’s like a false positive. The similar logics for Type II error, or false negative.
Also note that people use Greek letter α for type I error rate and β for type II error rate. α is also the significant level for a test, e.g. 5%. So when a single test reaches p-value 0.05, we can intuitively understand that with 5% of chance we make a mistake or 5% of cases we thought significant are actually not. β is related with the power of a test. Power of a test = the ability to detect True Positive among all real positive cases.
– Sensitivity and Specificity
|Total test (m)||Null hypothesis (H0) is|
|Judgement of Null Hypothesis (H0)||Reject (R)||V||S|
|Fail to reject||U||T|
Sensitivity = S / (S+T) = power = 1-β
Specificity = U / (U+V) = 1-α
– Why multiple testing matters?
It matters because we usually perform the same hypothesis tests not just once, but many many times. If your chance of making an error in single test is α, then your chance to make one or more errors in m tests will be
– How to control type I error rate in multiple test?
Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses, PCER = E(V)/m
Per-family error rate (PFER): the expected number of Type I errors, PFE = E(V).
Family-wise error rate (FWER): the probability of at least one type I error, FWER = P(V ≥ 1)
False discovery rate (FDR) is the expected proportion of Type I errors among the rejected hypotheses, FDR = E(V/R | R>0)P(R>0)
Positive false discovery rate (pFDR): the rate that discoveries are false, pFDR = E(V/R | R > 0)
– Controlling Family-Wise Error Rate
Many procedures have been developed to control the family-wise error rate P(V≥ 1), including the Bonferroni, Holm (1979), Hochberg (1988), and Sidak. It consists of two types: single-step (e.g. Bonferroni) and sequential adjustment (e.g. Holm or Hochberg). Bonferroni correction is to control the overall type I errors when all tests are independent. It rejects any hypothesis with p-value ≤ α/m. So, when doing corrections, simply multiply the nominal p-value by m to get the adjusted p-values. In R, it’s the following function
p.adjust(p, method = "bonferroni")
The sequential corrections is slightly more powerful than Bonferroni test. The Holm step-down procedure is the easiest to understand. First, sort your thousand p-values from low to high. Multiply the smallest p-value by one thousand. If that adjusted p-value is less than 0.05, then that gene shows evidence of differential expression. There is no difference as Bonferroni test for the gene. Then for the 2nd one, multiply its p-value by 999 (not one thousand) and see if it is less than 0.05. Multiply the third smallest p-value by 998, the fourth smallest by 997, etc. Compare each of these adjusted p-values to 0.05. We then insure that any adjusted p-value is at least as large as any preceding adjusted p-value. If it is not make sure it is equal to the largest of the preceding p-values. This is the algorithm of Holm step-down procedure. In R, it’s
p.adjust(p, method = "holm")