# A Permutation Test Regression Example

**Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*There may be a snap quiz at some point*!)

- How to construct a randomization test of the hypothesis that the regression slope coefficient is zero.
- A demonstration that the permutation test is “exact”. That it, its significance level is exactly what we assign it to be.
- A comparison between a permutation test and the usual t-test for this problem.
- A demonstration that the permutation test remains “exact”, even when the regression model is mi-specified by fitting it through the origin.
- A comparison of the powers of the randomization test and the t-test under this model mis-specification.

**The Monte Carlo experiment**

A lot of this information will be revealed by means of a Monte Carlo experiment. The associated R code can be downloaded from the code page for this blog. (Even if you’re not an R user, you can read the code file with any text editor, and there are lots of comments in it to help you.)

_{i}= β

_{1}+ β

_{2}x

_{i}+ ε

_{i}; i = 1, 2, 3, …., n. (1)

*we’re not really going to make any assumptions*beyond “exchangeability” of the sample observations. (See my previous post for an explanation of this.)

*conventional*tests that we conduct.

_{2}, where b

_{2}is the OLS estimator for β

_{2}. That would be just fine, even though it’s not a pivotal statistic. Then we could keep the ordering of the y values in the sample, and consider permutations of the x data (or

*vice versa*– it wouldn’t matter). See Edgington (1987) or Noreen (1989).

_{2}is proportional to the (Pearson) simple correlation coefficient between y and x. In fact the proportionality “constant” is the ratio of the sample standard deviations of y and x. This ratio stays the same as we permute the sample values. Moreover, for a set of x and y values, the correlation coefficient increases monotonically with b

_{2}.

So, all we have to do is to test the *null hypothesis of “no correlation”*, and this will serve our purpose precisely. *The alternative hypothesis will be 2-sided*, namely that there is a linear correlation between x and y. This is a problem that was dealt with in Example 2 of my last post on permutation tests, so we know already what’s involved.

**muOutlier**package for R to implement the permutation test.

Sample sizes of n = 15, 30, and 60 are are considered separately. (This is hardly a situation where we can appeal to “large-sample asymptotics”.) The number of permutations are 1.308×10^{12}, 2.653×10^{32}, and 8.321×10^{81} for n = 15, 30, and 60 respectively. Obviously, we’ll just use a random selection of these possible permutations! For n = 15, 30 we’ll use 2,000 selections; and we’ll use 5,000 selections for n = 60. A bit of experimentation shows that these numbers are sufficient for the accuracy that we want, as is the number of Monte Carlo replications that we’ll use, namely 2,000.

**Significance levels**

To get a “fair” comparison between the two type of tests at the outset, the DGP in (1) uses random disturbances that are i.i.d. Normal, with a zero mean and a constant variance. So, When the null hypothesis is true and the model that is estimation is correctly specified, the t-test should reject the null hypothesis in 5% of the replications of the experiment if we use the appropriate Student-t critical value(s).

So, with this in mind, let’s take a look at a partial set of our results, for the case where n = 30. In Table 1, * the regression is fitted through the origin, *so unless the value of β

_{1 }in the DGP is zero, the estimated model would be mis-specified:

**Table 1**

_{1}and β

_{2}are zero. That is, the DGP in (1) has no intercept, so the model is correctly specified, and the null hypothesis of a zero slope is true.

Because the null hypothesis is true, the power of the test is just its significance level (5%). Notice that the reported empirical significance levels match the anticipated 5%!

This is good news. Given the particular errors that were used in the DGP in the simulations, *this had to happen for the t-test*. However, the result for the randomization test could be misleading. Why? Well. we could obtain the 5% rejection rate correctly, even if the distribution of the p-values from which this rate was calculated is “weird”.

In an old post on this blog I discussed the sampling distribution of a p-value. One point that I covered was that if the null hypothesis is true, then this sampling distribution *has to be Uniform* on [0 , 1], regardless of the testing problem! This result gives another way of checking if the code for our simulation experiment is performing accurately, and that we’ve used enough replications and random selections of the permutations.

*for the situation in the first row of results in Table 1*:

**Figure 1**

As we can see, this distribution is “reasonably uniform”, as required. More formally, using the **uniftest** package in R we find that the Kolmogorov-Smirnov test statistic for uniformity is D = 0.998 (p = 0.24); and the Kuiper test statistic is V = 1.961 (p = 0.23). So, we seem to be in good shape here.

So far, all that we seem to have shown is that the permutation test and the t-test exhibit no “size-distortion” when the model is correctly specified, and the errors satisfy the assumptions needed for the t-test to be valid. Well, whoopee!

Let’s take a look back at Table 1, and now focus on the second line of results (highlighted in orange). Recall that the estimated model omits the intercept – the model is fitted through the origin. However, now, in Table 1, β_{1 }= 1, so the DGP includes an intercept and the fitted model is under-specified. *We’ve omitted a (constant) regressor from the estimated model*.

In this case the usual t-statistic follows a non-central Student-t distribution, with a non-centrality parameter that increases monotonically with β_{1}^{2}, and which depends on the x data and the variance of the error term. *It’s value is unobservable*! And we can’t use the critical value(s) from the non-central t distribution if we don’t know the value of the non-centrality parameter.

Obviously, the usual (central) Student-t critical values are no longer correct, and the *observed* significance level (the rejection rate of the null in the experiment when the null is true) will differ from 5%. Depending on the situation, it may be less than 5%, or greater than 5%. The extent of this difference is the “size-distortion” associated with the test when we mis-apply it in this way.

In the second line of Table 1 we see two important things. First, the t-test has a downwards size-distortion. We wanted to apply the test at the 5% significance level, but in fact it only rejected the null, when it was true, 1% of the time! Of course, if we just mis-applied this test once, in an application, we would have no idea if there was any substantial size-distortion or not.

The second (* really neat*) thing that we see in the second line of that table is that the permutation test still has a significance level of 5%! Even though the model is mis-specified, this doesn’t affect the test – at least in terms of it still being “exact” with respect to the significance level that we wanted to achieve.

*!*

**And this is a totally general result****Power considerations**

_{1 }≠ 0), but the value of the slope coefficient is increasing. That is, the null hypothesis is becoming

*more and more false*. We’re now looking at the power functions for the tests. Because the size of the t-test is distorted downwards, it’s not surprising that this test has lower power than the permutation test in this particular case. (More generally, that need not be the case – the power curves could intersect at some point.)

*upwards (*at 7.8%) and its power curve lies

*above*that of the permutation test. It’s not that the t-test is (necessarily) more powerful than the the permutation test. We can’t compare (true) power unless the two tests have the same significance level – and they don’t in this case.

**Figure 2**

**Figure 3**

**Be careful!**

**1.** We set β_{1 }= 1 in the DGP and this resulted in a mis-specified “fitted” regression model. This chosen value for β_{1 }affects the (numerical) results. The *magnitudes* of the size distortions and the powers are specific to this choice.

**2.** We need to be careful, here, when we talk about comparisons between the powers of the two tests (and this comment is universally applicable). Strictly, power comparisons are valid when the tests have the same (actual, empirical) significance level. The only exception is when Test A has *lower *actual significance level than Test B, but Test A has a *higher* rejection rate than Test B when the null is false (*i.e.*, higher “raw”, or apparent, power”) over the full parameter space. Then, Test A is more powerful than Test B.

In all other cases where there is size distortion, the “true” power will not be clear. One way to deal with this is to “size-adjust” any test that exhibits size distortion. This would be done, in our Monte Carlo experiment, by “jiggling” the critical values used for the t-test to ones that ensure that the test has a 5% rejection rate when the null is true. Then we could proceed to generate the corresponding power curves and make valid comparisons.

That’s fine, but of course *in practice* we wouldn’t know what modified critical values to use (unless we conducted a Monte Carlo experiment every time we undertook a real-life application). Perhaps this is worthy of another blog post at some stage, perhaps as an addition to my earlier ones on the basics of Monte Carlo simulation – here, here, and here.

**3.** As I stressed from the outset, model (1) is only a *simple* regression model. Multiple regression models are much more interesting, but this is where things get a lot trickier when it comes to constructing an *exact* permutation test. Quite a lot has been written about this problem. Some of the issues are discussed quite nicely by Kennedy (1995), but the (statistics) literature has moved along a bit since then and some of his suggestions are no longer supported. Some key references include Anderson and ter Braak (2003), Huh and Jhun (2001), Kim *et al*. (2000), LePage and Podgórski (1996), Oja (1987), and Schmoyer (1994).

Anderson and Robinson (2001) provide simulation evidence that favours the particular permutation procedure suggested by Freedman and Lane (1983). More recently, Nyblom (2015) introduces a permutation procedure based on the regression residuals, and illustrates its application to problems involving tests for autocorrelation, heteroskedasticity, and structural breaks.

**Summing up**

Here are the takeaways from this post:

- Permutation tests are nonparametric, or “distribution free”. Unlike the usual tests that we use in econometric, we don’t need to satisfy a whole lot of (possibly questionable) parametric and distributional assumptions.
- Permutation tests are easy to apply, even though they can be (moderately) computationally intensive.
- Permutation tests are “exact”, in the sense that achieve precisely the significance level that we want them to. There is no “size distortion”.
- Usually, we have lots of legitimate, and simple, choices for the test statistic in any given problem. In each case, the test will be exact, though its power may vary depending on our choice.
- A permutation test will still be exact, even if the model is mis-specified. This stands in stark contrast to the usual parametric tests that we use.
- A permutation test may be more, or less, powerful than its parametric counterparts, depending on the situation.
- If you plan to use permutation (randomization) tests in the context of the
*multiple*regression model (or its extensions), then their are some pitfalls that you need to be aware of. Be sure to look at the references that I’ve supplied.

Other than that – Happy Testing!

**References**

*Australian and New Zealand Journal of Statistics*, 43, 75-88.

Anderson, M. J., & C. J. F. ter Braak, 2003. Permutation tests for multi-factorial analysis of variance. *Journal of Statistical Computation and Simulation*, 73, 85–113.

Edgington, E. S., 1987. *Randomization Tests*. Marcel Dekker, New York.

Freedman, D. & D. Lane, 1983. A nonstochastic interpretation of reported significance levels. *Journal of Business and Economic Statistics*, 1, 292-298.

Huh, M-H. & M. Jhun, 2001. Random permutation testing in multiple linear regression. *Communications in Statistics – Theory Methods*, 30, 2023–2032.

*Journal of Business and Economic Statistics*, 13, 85-94.

Kim, H.-J., M. P. Fay, E. J. Feuer, & D. Midthune, 2000. Permutation tests for jointpoint regression with applications to cancer rates. *Statistics in Medicine,* 19, 335–351.

LePage, R. & K. Podgórski, 1996. Resampling permutations in regression without second moments. *Journal of Multivariate Analysis*, 57, 119–141.

Noreen, E. W., 1989. *Computer Intensive Methods for Testing Hypotheses: An Introduction*. Wiley, New York.

Nyblom, J., 2015. Permutation tests in linear regression. Chapter 5 in K. Nordhausen & S. Taskinen (eds.), *Modern Nonparametric, Robust and Multivariate Methods*. Springer International, Switzerland.

Oja, H., 1987. On permutation tests in multiple regression and analysis of covariance problems. *Australian Journal of Statistics*, 29, 91–100.

Schmoyer, R. L., 1994. Permutation tests for correlation in regression errors. *Journal of the American Statistical Association*, 89, 1507–1516.

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometrics Beat: Dave Giles' Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.