Generating sets of permutations

Posted on October 21, 2011 by ucfagls in R bloggers | 0 Comments

[This article was first published on From the bottom of the heap » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In previous posts I discussed how to generate a single permutation from a fully-randomised or restricted permutation design using shuffle(). Here I want to briefly mention the shuffleSet() function and illustrate it’s usage.

Every time you call shuffle() it has to interpret the control list to identify the type of permutation required. Whilst the overhead of this interpretation is not too high, there is no reason that it need be incurred just to generate a set of permutations. This is where shuffleSet() comes in. It works exactly like shuffle() taking the number of observations and a control object but in addition it takes an extra argument nset which is the number of permutations required for the set.

> args(shuffleSet)
function (n, nset = 1, control = permControl())
NULL

To generate 10 random permutations of ten observations you would use

> set.seed(2)
> shuffleSet(10, 10)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    2    7    5   10    6    8    1    3    4     9
 [2,]    6    3    7    2    9    5    4    1   10     8
 [3,]    7    4   10    2    3    6    1    8    5     9
 [4,]    1    2    7    8    4    6    5   10    9     3
 [5,]   10    3    1    2    6    4    5    7    9     8
 [6,]    1   10    6    7    2    5    4    3    8     9
 [7,]    8   10    6    2    9    3    7    4    1     5
 [8,]    3   10    1    2    7    4    6    9    8     5
 [9,]    4    7    1    3    2    5   10    8    6     9
[10,]   10    4    9    8    3    1    2    5    6     7

If those 10 observations were collected as a time series and we wanted 10 restricted permutations you would use

> set.seed(2)
> shuffleSet(10, 10, control = permControl(within = Within(type = "series")))
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    3    4    5    6    7    8    9   10    1     2
 [2,]    9   10    1    2    3    4    5    6    7     8
 [3,]    7    8    9   10    1    2    3    4    5     6
 [4,]    3    4    5    6    7    8    9   10    1     2
 [5,]    1    2    3    4    5    6    7    8    9    10
 [6,]    1    2    3    4    5    6    7    8    9    10
 [7,]    3    4    5    6    7    8    9   10    1     2
 [8,]   10    1    2    3    4    5    6    7    8     9
 [9,]    6    7    8    9   10    1    2    3    4     5
[10,]    7    8    9   10    1    2    3    4    5     6

From the above set of permutations, the cyclic shifts employed in the "series" permutation type is clear. One problem with the set we just produced is that the same permutation was returned more than once. In fact, there were only six unique permutations in the set requested. This is due to there only being 10 possible permutations of the numbers 1, 2, …, 10 if we allow cyclic shifts in a single direction

> numPerms(10, control = permControl(within = Within(type = "series")))
[1] 10

shuffle() and shuffleSet() know nothing of these limits, but there are functions in the permute package that can tell you the number of possible permutations (numPerms()) and generate the entire set of permutations for a stated design (allPerms()). I’ll take a look at allPerms() in a future posting.

I return now to the Golden Jackal mandible length example I used in an earlier post but update the example to make use of shuffleSet() instead of shuffle(). I will just show the code and output for the permutation test, refer to the previous post for details:

> data(jackal) ## load the data
> ## function to compute the difference of means
> meanDif <- function(x, grp) {
+     mean(x[grp == "Male"]) - mean(x[grp == "Female"])
+ }
> N <- nrow(jackal)
> set.seed(42)
> ## generate the set of 4999 random permutations
> pSet <- shuffleSet(N, 4999)
> ## iterate over the set
> Djackal <- apply(pSet, 1, function(i, data) with(data, meanDif(Length, Sex[i])), data = jackal)
> Djackal <- c(Djackal, with(jackal, meanDif(Length, Sex)))
> (Dbig <- sum(Djackal >= Djackal[5000]))
[1] 12
> Dbig/length(Djackal)
[1] 0.0024

The last two lines of R code compute the number of observations in the Null distribution with differences in mean mandible length as great or greater than the observed difference, and the resulting permutation p-value. These are the same as those computed in the previous post.

Generating entire sets of permutations is useful for several reasons. One recent example that we came across is with the new parallel processing capabilities in the forthcoming version of R. We are able to generate a set of permutations and then distribute the process of the permutation test over a number of CPUs or worker threads, each dealing with a subset of the permutations we generated. This can greatly reduce the compute time needed for the permutation test, especially where the objective function is computationally complex, but allows us to not worry about controlling the random number generator in each separate process — this is all done within the main function and only the relevant subset of permutations is passed to each worker process. An additional reason for generating a set of permutation to work with rather than individual permutations is that it is easy to switch between using a set of randomly generated permutations or the set of all possible permutations where that set is not overly large. allPerms() returns the set of permutations in the same way that shuffleSet() does, so we can simplify our code if we write the test to iterate over a set of permutations.

The full script for the Golden Jackal permutation test is shown below:

data(jackal) ## load the data
## function to compute the difference of means
meanDif <- function(x, grp) {
    mean(x[grp == "Male"]) - mean(x[grp == "Female"])
}
N <- nrow(jackal)
set.seed(42)
## generate the set of 4999 random permutations
pSet <- shuffleSet(N, 4999)
## iterate over the set
Djackal <- apply(pSet, 1, function(i, data) with(data, meanDif(Length, Sex[i])), data = jackal)
Djackal <- c(Djackal, with(jackal, meanDif(Length, Sex)))
(Dbig <- sum(Djackal >= Djackal[5000]))
Dbig/length(Djackal)