# Permutation Theory In Action

September 2, 2017
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

While working on a large client project using `Sparklyr` and multinomial regression we recently ran into a problem: `Apache Spark` chooses the order of multinomial regression outcome targets, whereas `R` users are used to choosing the order of the targets (please see here for some details). So to make things more like `R` users expect, we need a way to translate one order to another.

Providing good solutions to gaps like this is one of the thing Win-Vector LLC does both in our consulting and training practices.

Let’s take a look at an example. Suppose our two orderings are `o1` (the ordering `Spark ML` chooses) and `o2` (the order the `R` user chooses).

``````set.seed(326346)
symbols <- letters[1:7]

o1 <- sample(symbols, length(symbols), replace = FALSE)
o1``````
``##  "e" "a" "b" "f" "d" "c" "g"``
``````o2 <- sample(symbols, length(symbols), replace = FALSE)
o2``````
``##  "d" "g" "f" "e" "b" "c" "a"``

To translate `Spark` results into `R` results we need a permutation that takes `o1` to `o2`. The idea is: if we had a permeation that takes `o1` to `o2` we could use it to re-map predictions that are in `o1` order to be predictions in `o2` order.

To solve this we crack open our article on the algebra of permutations.

We are going to use the fact that the `R` command `base::order(x)` builds a permutation `p` such that `x[p]` is in order.

Given this the solution is: we find permutations `p1` and `p2` such that `o1[p1]` is ordered and `o2[p2]` is ordered. Then build a permutation `perm` such that `o1[perm] = (o1[p1])[inverse_permutation(p2)]`. I.e., to get from `o1` to `o2` move `o1` to sorted order and then move from the sorted order to `o2`‘s order (by using the reverse of the process that sorts `o2`). Again, the tools to solve this are in our article on the relation between permutations and indexing.

Below is the complete solution (including combining the two steps into a single permutation):

``````p1 <- order(o1)
p2 <- order(o2)

# invert p2
# see: http://www.win-vector.com/blog/2017/05/on-indexing-operators-and-composition/
p2inv <- seq_len(length(p2))
p2inv[p2] <- seq_len(length(p2))

(o1[p1])[p2inv]``````
``##  "d" "g" "f" "e" "b" "c" "a"``
``````# composition rule: (o1[p1])[p2inv] == o1[p1[p2inv]]
# see: http://www.win-vector.com/blog/2017/05/on-indexing-operators-and-composition/
perm <- p1[p2inv]
o1[perm]``````
``##  "d" "g" "f" "e" "b" "c" "a"``

The equivilence "`(o1[p1])[p2inv] == o1[p1[p2inv]]`" is frankly magic (though also quickly follows "by definition"), and studying it is the topic of our original article on permutations.

The above application is a good example of why it is nice to have a little theory worked out, even before you think you need it.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...