**From the bottom of the heap » R**, and kindly contributed to R-bloggers)

In previous posts I discussed how to generate a single permutation from a fully-randomised or restricted permutation design using `shuffle()`

. Here I want to briefly mention the `shuffleSet()`

function and illustrate it’s usage.

Every time you call `shuffle()`

it has to interpret the `control`

list to identify the type of permutation required. Whilst the overhead of this interpretation is not too high, there is no reason that it need be incurred just to generate a set of permutations. This is where `shuffleSet()`

comes in. It works exactly like `shuffle()`

taking the number of observations and a `control`

object but in addition it takes an extra argument `nset`

which is the number of permutations required for the set.

> args(shuffleSet) function (n, nset = 1, control = permControl()) NULL

To generate 10 random permutations of ten observations you would use

> set.seed(2) > shuffleSet(10, 10) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 2 7 5 10 6 8 1 3 4 9 [2,] 6 3 7 2 9 5 4 1 10 8 [3,] 7 4 10 2 3 6 1 8 5 9 [4,] 1 2 7 8 4 6 5 10 9 3 [5,] 10 3 1 2 6 4 5 7 9 8 [6,] 1 10 6 7 2 5 4 3 8 9 [7,] 8 10 6 2 9 3 7 4 1 5 [8,] 3 10 1 2 7 4 6 9 8 5 [9,] 4 7 1 3 2 5 10 8 6 9 [10,] 10 4 9 8 3 1 2 5 6 7

If those 10 observations were collected as a time series and we wanted 10 restricted permutations you would use

> set.seed(2) > shuffleSet(10, 10, control = permControl(within = Within(type = "series"))) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 3 4 5 6 7 8 9 10 1 2 [2,] 9 10 1 2 3 4 5 6 7 8 [3,] 7 8 9 10 1 2 3 4 5 6 [4,] 3 4 5 6 7 8 9 10 1 2 [5,] 1 2 3 4 5 6 7 8 9 10 [6,] 1 2 3 4 5 6 7 8 9 10 [7,] 3 4 5 6 7 8 9 10 1 2 [8,] 10 1 2 3 4 5 6 7 8 9 [9,] 6 7 8 9 10 1 2 3 4 5 [10,] 7 8 9 10 1 2 3 4 5 6

From the above set of permutations, the cyclic shifts employed in the `"series"`

permutation type is clear. One problem with the set we just produced is that the same permutation was returned more than once. In fact, there were only six unique permutations in the set requested. This is due to there only being 10 possible permutations of the numbers 1, 2, …, 10 if we allow cyclic shifts in a single direction

> numPerms(10, control = permControl(within = Within(type = "series"))) [1] 10

`shuffle()`

and `shuffleSet()`

know nothing of these limits, but there are functions in the **permute** package that can tell you the number of possible permutations (`numPerms()`

) and generate the entire set of permutations for a stated design (`allPerms()`

). I’ll take a look at `allPerms()`

in a future posting.

I return now to the Golden Jackal mandible length example I used in an earlier post but update the example to make use of `shuffleSet()`

instead of `shuffle()`

. I will just show the code and output for the permutation test, refer to the previous post for details:

> data(jackal) ## load the data > ## function to compute the difference of means > meanDif <- function(x, grp) { + mean(x[grp == "Male"]) - mean(x[grp == "Female"]) + } > N <- nrow(jackal) > set.seed(42) > ## generate the set of 4999 random permutations > pSet <- shuffleSet(N, 4999) > ## iterate over the set > Djackal <- apply(pSet, 1, function(i, data) with(data, meanDif(Length, Sex[i])), data = jackal) > Djackal <- c(Djackal, with(jackal, meanDif(Length, Sex))) > (Dbig <- sum(Djackal >= Djackal[5000])) [1] 12 > Dbig/length(Djackal) [1] 0.0024

The last two lines of R code compute the number of observations in the Null distribution with differences in mean mandible length as great or greater than the observed difference, and the resulting permutation *p*-value. These are the same as those computed in the previous post.

Generating entire sets of permutations is useful for several reasons. One recent example that we came across is with the new parallel processing capabilities in the forthcoming version of R. We are able to generate a set of permutations and then distribute the process of the permutation test over a number of CPUs or worker threads, each dealing with a subset of the permutations we generated. This can greatly reduce the compute time needed for the permutation test, especially where the objective function is computationally complex, but allows us to not worry about controlling the random number generator in each separate process — this is all done within the main function and only the relevant subset of permutations is passed to each worker process. An additional reason for generating a set of permutation to work with rather than individual permutations is that it is easy to switch between using a set of randomly generated permutations or the set of all possible permutations where that set is not overly large. `allPerms()`

returns the set of permutations in the same way that `shuffleSet()`

does, so we can simplify our code if we write the test to iterate over a set of permutations.

The full script for the Golden Jackal permutation test is shown below:

data(jackal) ## load the data ## function to compute the difference of means meanDif <- function(x, grp) { mean(x[grp == "Male"]) - mean(x[grp == "Female"]) } N <- nrow(jackal) set.seed(42) ## generate the set of 4999 random permutations pSet <- shuffleSet(N, 4999) ## iterate over the set Djackal <- apply(pSet, 1, function(i, data) with(data, meanDif(Length, Sex[i])), data = jackal) Djackal <- c(Djackal, with(jackal, meanDif(Length, Sex))) (Dbig <- sum(Djackal >= Djackal[5000])) Dbig/length(Djackal)

**leave a comment**for the author, please follow the link and comment on their blog:

**From the bottom of the heap » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...