# Fast Romano-Wolf Multiple Testing Corrections for fixest đș

**s3alfisc.github.io/blog/**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For the final chapter of my dissertation, I had examined the effects of ordinal class rank on the academic achievement of Danish primary school students (following a popular identification strategy introduced in a paper by Murphy and Weinhard). Because of the richness of the Danish register data, I had a large number of potential outcome variables at my disposal, and as a result, I was able to examine literally all the outcomes that the previous literature had covered in individual studies – the effect of rank on achievement, personality, risky behaviour, mental health, parental investment, etc. – in one paper.

But with (too) many outcome variables comes a risk: inflated type 1 error rates, or an increased risk of âfalse positivesâ. So I was wondering: were all the significant effects I estimated – shown in the figure above – ârealâ, or was I simply being fooled by randomness?

A common way to control the risk of false positive when testing multiple hypothesis is to use methods that control the âfamily-wiseâ error rate, i.e.Â the risk of at least one false positive in a family of S hypotheses.

Among such methods, Romano and Wolfâs correction is particularly popular, because it is âuniformly the most powerfulâ. Without going into too much detail, Iâll just say that if you have a choice between a number of methods that control the family-wise error rate at a desired level \(\alpha\), you might want to choose the âmost powerfulâ one, i.e.Â the one that has the highest success rate of finding a true effect.

Now, there is actually an amazing Stata package for the Romano-Wolf method called `rwolf`

.

But this is an R blog, and Iâm an R guy âŠ In addition, my regression involved several million rows and some high-dimensional fixed effects, and `rwolf`

quickly showed its limitations. It just wasnât fast enough!

While playing around with the `rwolf`

package, I finally did my due diligence on the method it implements, and after a little background reading, I realized that both the Romano and Wolf method – as well as its main rival, the method proposed by Westfall and Young – are based on the bootstrap!

But wait! Had I not just spent a lot of time porting a super-fast bootstrap algorithm from R to Stata? Could I not use Roodman et alâs âfast and wildâ cluster bootstrap algorithm for bootstrap-based multiple hypothesis correction?

Of course it was inevitable: I ended up writing an R package. Today I am happy to present the first MVP version of the `wildwrwolf`

package!

## The wildrwolf package

You can simply install the package from github or r-universe by typing

# install.packages("devtools") devtools::install_github("s3alfisc/wildrwolf") # from r-universe (windows & mac, compiled R > 4.0 required) install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')

The Romano Wolf correction in `wildrwolf`

is implemented as a post-estimation commands for multiple estimation objects from the fabulous `fixest`

package.

To demonstrate how `wildrwolf's`

main function, `rwolf`

, works, letâs first simulate some data:

library(wildrwolf) library(fixest) set.seed(1412) library(wildrwolf) library(fixest) set.seed(1412) N <- 5000 X1 <- rnorm(N) X2 <- rnorm(N) rho <- 0.5 sigma <- matrix(rho, 4, 4); diag(sigma) <- 1 u <- MASS::mvrnorm(n = N, mu = rep(0, 4), Sigma = sigma) Y1 <- 1 + 1 * X1 + X2 Y2 <- 1 + 0.01 * X1 + X2 Y3 <- 1 + 0.4 * X1 + X2 Y4 <- 1 + -0.02 * X1 + X2 for(x in 1:4){ var_char <- paste0("Y", x) assign(var_char, get(var_char) + u[,x]) } group_id <- sample(1:100, N, TRUE) data <- data.frame(Y1 = Y1, Y2 = Y2, Y3 = Y3, Y4 = Y4, X1 = X1, X2 = X2, group_id = group_id, splitvar = sample(1:2, N, TRUE))

We now estimate a regression via the `fixest`

package:

fit <- feols(c(Y1, Y2, Y3, Y4) ~ csw(X1,X2), data = data, cluster = ~group_id, ssc = ssc(cluster.adj = TRUE)) # clean workspace except for res & data rm(list= ls()[!(ls() %in% c('fit','data'))])

For all eight estimated regressions, we want to apply the Romano-Wolf correction to the parameter of interest, `X1`

. We simply need to provide the regression object of type `fixest_multi`

(it is also possible to simply provide a list of objects of type `fixest`

), the parameter of interest, the number of bootstrap draws, and possibly a seed (for replicability). Thatâs it! If the regressions use clustered standard errors, `rwolf`

will pick this up and run a wild cluster bootstrap, otherwise just a robust wild bootstrap.

pracma::tic() res_rwolf <- wildrwolf::rwolf( models = fit, param = "X1", B = 9999, seed = 23 ) ## | | | 0% | |========= | 12% | |================== | 25% | |========================== | 38% | |=================================== | 50% | |============================================ | 62% | |==================================================== | 75% | |============================================================= | 88% | |======================================================================| 100% pracma::toc() ## elapsed time is 3.980000 seconds

For \(N=5000\) observations with \(G=100\) clusters, \(S=8\) hypothesis and \(B=9999\) bootstrap draws, the calculation of Romano-Wolf corrected p-values takes less than 5 seconds. If you ask me, that is pretty fast! =) đ

We can now compare the results of `rwolf`

with the uncorrected p-values and another popular multiple testing adjustment, Holmâs method.

pvals <- lapply(fit, function(x) pvalue(x)["X1"]) |> unlist() df <- data.frame( "uncorrected" = pvals, "Holm" = p.adjust(pvals, method = "holm"), "rwolf" = res_rwolf$pval ) rownames(df) <- NULL round(df, 3) ## uncorrected Holm rwolf ## 1 0.000 0.000 0.000 ## 2 0.000 0.000 0.000 ## 3 0.140 0.420 0.367 ## 4 0.033 0.132 0.128 ## 5 0.000 0.000 0.000 ## 6 0.000 0.000 0.000 ## 7 0.398 0.420 0.394 ## 8 0.152 0.420 0.367

Both Holmâs method and `rwolf`

produce similar corrected p-values, which - in general - are larger than the uncorrected p-values.

## But does it actually work? Monte Carlo Experiments

We test \(S=6\) hypotheses and generate data as

\[Y_{i,s,g} = \beta_{0} + \beta_{1,s} D_{i} + u_{i,g} + \epsilon_{i,s} \] where \(D_i = 1(U_i > 0.5)\) and \(U_i\) is drawn from a uniform distribution, \(u_{i,g}\) is a cluster level shock with intra-cluster correlation \(0.5\), and the idiosyncratic error term is drawn from a multivariate random normal distribution with mean \(0_S\) and covariance matrix

S <- 6 rho <- 0.5 Sigma <- matrix(1, 6, 6) diag(Sigma) <- rho Sigma

This experiment imposes a data generating process as in equation (9) in Clarke, Romano and Wolf, with an additional error term \(u_g\) for \(G=20\) clusters and intra-cluster correlation 0.5 and \(N=1000\) observations.

You can run the simulations via the `run_fwer_sim()`

function attached
in the package.

# note that this will take some time res <- run_fwer_sim( seed = 232123, n_sims = 500, B = 499, N = 1000, s = 6, rho = 0.5 #correlation between hypotheses, not intra-cluster! ) # > res # reject_5 reject_10 rho # fit_pvalue 0.274 0.460 0.5 # fit_pvalue_holm 0.046 0.112 0.5 # fit_padjust_rw 0.052 0.110 0.5

Both Holmâs method and `wildrwolf`

control the family wise error rates, at both the 5 and 10% significance level. Very cool!

## The method by Westfall and Young

A package for Westfall and Youngâs (1993) method is currently in development. I am not yet fully convinced that it works as intented - in simulations, I regularly find that `wildwyoung`

overrejects.

## Literature

Clarke, Damian, Joseph P. Romano, and Michael Wolf. âThe RomanoâWolf multiple-hypothesis correction in Stata.â The Stata Journal 20.4 (2020): 812-843.

Murphy, Richard, and Felix Weinhardt. âTop of the class: The importance of ordinal rank.â The Review of Economic Studies 87.6 (2020): 2777-2826.

Romano, Joseph P., and Michael Wolf. âStepwise multiple testing as formalized data snooping.â Econometrica 73.4 (2005): 1237-1282.

Roodman, David, et al.Â âFast and wild: Bootstrap inference in Stata using boottest.â The Stata Journal 19.1 (2019): 4-60.

Westfall, Peter H., and S. Stanley Young. Resampling-based multiple testing: Examples and methods for p-value adjustment. Vol. 279. John Wiley & Sons, 1993.

**leave a comment**for the author, please follow the link and comment on their blog:

**s3alfisc.github.io/blog/**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.