# Example 9.32: Multiple testing simulation

May 21, 2012
By

(This article was first published on SAS and R, and kindly contributed to R-bloggers)

In examples 9.30 and 9.31 we explored corrections for multiple testing and then extracting p-values adjusted by the Benjamini and Hochberg (or FDR) procedure. In this post we'll develop a simulation to explore the impact of "strong" and "weak" control of the family-wise error rate offered in multiple comparison corrections. Loosely put, weak control procedures may fail when some of the null hypotheses are actually false, in that the remaining (true) nulls may be rejected more than the nominal proportion of times.

For our simulation, we'll develop flexible code to generate some p-values from false nulls and others from true nulls. We'll assume that the true nulls have p-values distributed uniform (0,1); the false nulls will have p-values distributed uniform with a user-determined maximum. We'll also allow the number of tests overall and the number of false nulls to be set.

SAS
In SAS, a macro does the job. It accepts the user parameters described above, then generates false and true nulls for each desired simulation. With the data created, we can use proc multtest to apply the FDR procedure, with the ODS system saving the results. Note how the by statement allows us to replicate the analysis for each simulated set of p-values without creating a separate data set for each one. (Also note that we do not use proc sort before that by statement-- this can be risky, but works fine here.)
%macro fdr(nsims=1, ntests = 20, nfalse=10, howfalse=.01);ods select none;data test;do sim = 1 to &nsims;  do i = 1 to &ntests;    raw_p = uniform(0) *       ( ((i le &nfalse) * &howfalse ) + ((i gt &nfalse) * 1 ) );    output;  end;end;run;ods output pvalues = __pv;proc multtest inpvalues=test fdr;by sim;run;

With the results in hand, (still within the macro) we need to do some massaging to make the results usable. First we'll recode the rejections (assuming a 0.05 alpha level) so that non-rejections are 0 and rejections are 1/number of tests. That way we can just sum across the results to get the proportion of rejections. Next, we transform the data to get each simulation in a row (section 1.5.4). (The data output from proc multtest has nsims*ntests rows. After transposing, there are nsims rows.) Finally, we can sum across the rows to get the proportion of tests rejected in each simulated family of tests. The results are shown in a table made with proc freq.
data __pv1;set __pv;if falsediscoveryrate lt 0.05 then fdrprop = 1/&ntests;else fdrprop =0;run;proc transpose data = __pv1 (keep =sim fdrprop) out = pvals_a;by sim; run;data pvals;set pvals_a;prop = sum(of col1 - col&ntests);run;ods select all;proc freq data = pvals; tables prop; run;%mend fdr;%fdr(nsims = 1000, ntests = 20, nfalse = 10, howfalse=.001);                                      Cumulative    Cumulative     prop    Frequency     Percent     Frequency      Percent     ---------------------------------------------------------      0.5         758       75.80           758        75.80     0.55         210       21.00           968        96.80      0.6          27        2.70           995        99.50     0.65           5        0.50          1000       100.00

So true nulls were rejected 24% of the time, which seems like a lot. Multiple comparison procedures with "strong" control of the familywise error rate will reject them only 5% of the time. Building this simulation as a macro facilitates exploring the effects of the multiple comparison procedures in a variety of settings.

R
As in example 9.31, the R code is rather simpler, though perhaps a bit opaque. To make the p-values, we make them first for all of tests with the false, then for all of the tests with the true nulls. The matrix function reads these in by column, by default, meaning that the first nfalse columns get the nsims*nfalse observations. The apply function generates the FDR p-values for each row of the data set. The t() function just transposes the resulting matrix so that we get back a row for each simulation. As in the SAS version, we'll count each rejection as 1/ntests, and non-rejections as 0; we do this with the ifelse() statement. Then we sum across the simulations with another call to apply() and show the results with a simple table.
checkfdr = function(nsims=1, ntests=100, nfalse=0, howfalse=0.001) {  raw_p = matrix(c(runif(nfalse * nsims) * howfalse,                    runif((ntests-nfalse) * nsims)), nrow=nsims)  fdr = t(apply(raw_p, 1, p.adjust, "fdr"))  reject = ifelse(fdr<.05, 1/ntests,0)  prop = apply(reject, 1, sum)  prop.table(table(prop)) }> checkfdr(nsims=1000, ntests=20, nfalse=10, howfalse=.001)prop  0.5  0.55   0.6  0.65 0.755 0.210 0.032 0.003

The results are reassuringly similar to those from SAS. In this R code, it's particularly simple to try a different test-- just replace "fdr" in the p.adjust() call. Here's the result with the Hochberg test, which has strong control.
checkhoch = function(nsims=1, ntests=100, nfalse=0, howfalse=0.001) {   pvals = matrix(c(runif(nfalse * nsims) * howfalse,                     runif((ntests-nfalse) * nsims)), nrow=nsims)   hochberg = t(apply(pvals, 1, p.adjust,"hochberg"))   reject = ifelse(hochberg<.05,1/ntests,0)   prop = apply(reject, 1, sum)   prop.table(table(prop)) } > checkhoch(nsims=1000, ntests=20, nfalse=10, howfalse=.001)prop  0.5  0.55   0.6 0.951 0.046 0.003

With this procedure one or more of the true nulls is rejected an appropriate 4.9% of the time. For the most part, we feel more comfortable using multiple testing procedures with "strong control".