- How to construct a randomization test of the hypothesis that the regression slope coefficient is zero.
- A demonstration that the permutation test is “exact”. That it, its significance level is exactly what we assign it to be.
- A comparison between a permutation test and the usual t-test for this problem.
- A demonstration that the permutation test remains “exact”, even when the regression model is mi-specified by fitting it through the origin.
- A comparison of the powers of the randomization test and the t-test under this model mis-specification.
The Monte Carlo experiment
A lot of this information will be revealed by means of a Monte Carlo experiment. The associated R code can be downloaded from the code page for this blog. (Even if you’re not an R user, you can read the code file with any text editor, and there are lots of comments in it to help you.)
So, all we have to do is to test the null hypothesis of “no correlation”, and this will serve our purpose precisely. The alternative hypothesis will be 2-sided, namely that there is a linear correlation between x and y. This is a problem that was dealt with in Example 2 of my last post on permutation tests, so we know already what’s involved.
Sample sizes of n = 15, 30, and 60 are are considered separately. (This is hardly a situation where we can appeal to “large-sample asymptotics”.) The number of permutations are 1.308×1012, 2.653×1032, and 8.321×1081 for n = 15, 30, and 60 respectively. Obviously, we’ll just use a random selection of these possible permutations! For n = 15, 30 we’ll use 2,000 selections; and we’ll use 5,000 selections for n = 60. A bit of experimentation shows that these numbers are sufficient for the accuracy that we want, as is the number of Monte Carlo replications that we’ll use, namely 2,000.
To get a “fair” comparison between the two type of tests at the outset, the DGP in (1) uses random disturbances that are i.i.d. Normal, with a zero mean and a constant variance. So, When the null hypothesis is true and the model that is estimation is correctly specified, the t-test should reject the null hypothesis in 5% of the replications of the experiment if we use the appropriate Student-t critical value(s).
So, with this in mind, let’s take a look at a partial set of our results, for the case where n = 30. In Table 1, the regression is fitted through the origin, so unless the value of β1 in the DGP is zero, the estimated model would be mis-specified:
Because the null hypothesis is true, the power of the test is just its significance level (5%). Notice that the reported empirical significance levels match the anticipated 5%!
This is good news. Given the particular errors that were used in the DGP in the simulations, this had to happen for the t-test. However, the result for the randomization test could be misleading. Why? Well. we could obtain the 5% rejection rate correctly, even if the distribution of the p-values from which this rate was calculated is “weird”.
In an old post on this blog I discussed the sampling distribution of a p-value. One point that I covered was that if the null hypothesis is true, then this sampling distribution has to be Uniform on [0 , 1], regardless of the testing problem! This result gives another way of checking if the code for our simulation experiment is performing accurately, and that we’ve used enough replications and random selections of the permutations.
As we can see, this distribution is “reasonably uniform”, as required. More formally, using the uniftest package in R we find that the Kolmogorov-Smirnov test statistic for uniformity is D = 0.998 (p = 0.24); and the Kuiper test statistic is V = 1.961 (p = 0.23). So, we seem to be in good shape here.
So far, all that we seem to have shown is that the permutation test and the t-test exhibit no “size-distortion” when the model is correctly specified, and the errors satisfy the assumptions needed for the t-test to be valid. Well, whoopee!
Let’s take a look back at Table 1, and now focus on the second line of results (highlighted in orange). Recall that the estimated model omits the intercept – the model is fitted through the origin. However, now, in Table 1, β1 = 1, so the DGP includes an intercept and the fitted model is under-specified. We’ve omitted a (constant) regressor from the estimated model.
In this case the usual t-statistic follows a non-central Student-t distribution, with a non-centrality parameter that increases monotonically with β12, and which depends on the x data and the variance of the error term. It’s value is unobservable! And we can’t use the critical value(s) from the non-central t distribution if we don’t know the value of the non-centrality parameter.
Obviously, the usual (central) Student-t critical values are no longer correct, and the observed significance level (the rejection rate of the null in the experiment when the null is true) will differ from 5%. Depending on the situation, it may be less than 5%, or greater than 5%. The extent of this difference is the “size-distortion” associated with the test when we mis-apply it in this way.
In the second line of Table 1 we see two important things. First, the t-test has a downwards size-distortion. We wanted to apply the test at the 5% significance level, but in fact it only rejected the null, when it was true, 1% of the time! Of course, if we just mis-applied this test once, in an application, we would have no idea if there was any substantial size-distortion or not.
The second (really neat) thing that we see in the second line of that table is that the permutation test still has a significance level of 5%! Even though the model is mis-specified, this doesn’t affect the test – at least in terms of it still being “exact” with respect to the significance level that we wanted to achieve.
And this is a totally general result!
1. We set β1 = 1 in the DGP and this resulted in a mis-specified “fitted” regression model. This chosen value for β1 affects the (numerical) results. The magnitudes of the size distortions and the powers are specific to this choice.
2. We need to be careful, here, when we talk about comparisons between the powers of the two tests (and this comment is universally applicable). Strictly, power comparisons are valid when the tests have the same (actual, empirical) significance level. The only exception is when Test A has lower actual significance level than Test B, but Test A has a higher rejection rate than Test B when the null is false (i.e., higher “raw”, or apparent, power”) over the full parameter space. Then, Test A is more powerful than Test B.
In all other cases where there is size distortion, the “true” power will not be clear. One way to deal with this is to “size-adjust” any test that exhibits size distortion. This would be done, in our Monte Carlo experiment, by “jiggling” the critical values used for the t-test to ones that ensure that the test has a 5% rejection rate when the null is true. Then we could proceed to generate the corresponding power curves and make valid comparisons.
That’s fine, but of course in practice we wouldn’t know what modified critical values to use (unless we conducted a Monte Carlo experiment every time we undertook a real-life application). Perhaps this is worthy of another blog post at some stage, perhaps as an addition to my earlier ones on the basics of Monte Carlo simulation – here, here, and here.
3. As I stressed from the outset, model (1) is only a simple regression model. Multiple regression models are much more interesting, but this is where things get a lot trickier when it comes to constructing an exact permutation test. Quite a lot has been written about this problem. Some of the issues are discussed quite nicely by Kennedy (1995), but the (statistics) literature has moved along a bit since then and some of his suggestions are no longer supported. Some key references include Anderson and ter Braak (2003), Huh and Jhun (2001), Kim et al. (2000), LePage and Podgórski (1996), Oja (1987), and Schmoyer (1994).
Anderson and Robinson (2001) provide simulation evidence that favours the particular permutation procedure suggested by Freedman and Lane (1983). More recently, Nyblom (2015) introduces a permutation procedure based on the regression residuals, and illustrates its application to problems involving tests for autocorrelation, heteroskedasticity, and structural breaks.
Here are the takeaways from this post:
- Permutation tests are nonparametric, or “distribution free”. Unlike the usual tests that we use in econometric, we don’t need to satisfy a whole lot of (possibly questionable) parametric and distributional assumptions.
- Permutation tests are easy to apply, even though they can be (moderately) computationally intensive.
- Permutation tests are “exact”, in the sense that achieve precisely the significance level that we want them to. There is no “size distortion”.
- Usually, we have lots of legitimate, and simple, choices for the test statistic in any given problem. In each case, the test will be exact, though its power may vary depending on our choice.
- A permutation test will still be exact, even if the model is mis-specified. This stands in stark contrast to the usual parametric tests that we use.
- A permutation test may be more, or less, powerful than its parametric counterparts, depending on the situation.
- If you plan to use permutation (randomization) tests in the context of the multiple regression model (or its extensions), then their are some pitfalls that you need to be aware of. Be sure to look at the references that I’ve supplied.
Other than that – Happy Testing!
Anderson, M. J., & C. J. F. ter Braak, 2003. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation, 73, 85–113.
Edgington, E. S., 1987. Randomization Tests. Marcel Dekker, New York.
Freedman, D. & D. Lane, 1983. A nonstochastic interpretation of reported significance levels. Journal of Business and Economic Statistics, 1, 292-298.
Huh, M-H. & M. Jhun, 2001. Random permutation testing in multiple linear regression. Communications in Statistics – Theory Methods, 30, 2023–2032.
Kim, H.-J., M. P. Fay, E. J. Feuer, & D. Midthune, 2000. Permutation tests for jointpoint regression with applications to cancer rates. Statistics in Medicine, 19, 335–351.
LePage, R. & K. Podgórski, 1996. Resampling permutations in regression without second moments. Journal of Multivariate Analysis, 57, 119–141.
Noreen, E. W., 1989. Computer Intensive Methods for Testing Hypotheses: An Introduction. Wiley, New York.
Nyblom, J., 2015. Permutation tests in linear regression. Chapter 5 in K. Nordhausen & S. Taskinen (eds.), Modern Nonparametric, Robust and Multivariate Methods. Springer International, Switzerland.
Oja, H., 1987. On permutation tests in multiple regression and analysis of covariance problems. Australian Journal of Statistics, 29, 91–100.
Schmoyer, R. L., 1994. Permutation tests for correlation in regression errors. Journal of the American Statistical Association, 89, 1507–1516.