**R-Bloggers – Learning Machines**, and kindly contributed to R-bloggers)

Sorting values is one of the bread and butter tasks in computer science: this post uses it as a use case to learn what recursion is all about. It starts with some nerd humour… and ends with some more, so read on!

Have a look at the following code and try to understand it:

bogosort <- function(x) { while(is.unsorted(x)) x <- sample(x) x } set.seed(123) (n <- sample(1:100, 10)) ## [1] 29 79 41 86 91 5 50 83 51 42 bogosort(n) ## [1] 5 29 41 42 50 51 79 83 86 91

Obviously bogosort did its job… but in a ludicrously inefficient way! It just shuffles all values over and over until they are sorted *by chance*! In the worst case it just shuffles forever (i.e. until the sun swallows the earth or your partner switches the computer off… whatever comes first).

To create a more efficient sorting algorithm one could use the following strategy:

- Draw the smallest number, put it into a vector and
- Sort the remaining numbers by going back to 1.

The algorithm (which is called selection sort, see below) effectively sorts by calling itself! This is the main idea behind recursion.

To dive deeper into how recursion works watch the following video: What on Earth is Recursion? – Computerphile

So, basically

Recursion means defining a function in terms of itself!

Additionally you need some stopping criterion or base case, otherwise you would create an infinite loop.

The recursive version of the factorial function is:

In the video this formula is written in C, here is the R version:

fact <- function(n) { if (n == 1) 1 else n * fact(n-1) } fact(4) ## [1] 24 factorial(4) #inbuilt factorial function ## [1] 24

After we have seen how recursion actually works, let us implement the above selection sort algorithm in R. First watch this, admittedly unconventional, video: Select-sort with Gypsy folk dance.

We now implement the main idea recursively:

selsort <- function(x) { if(length(x) > 1) { mini <- which.min(x) c(x[mini], selsort(x[-mini])) } else x } set.seed(123) (v <- sample(1:100, 20)) ## [1] 29 79 41 86 91 5 50 83 51 42 87 98 60 94 9 77 21 4 27 78 selsort(v) ## [1] 4 5 9 21 27 29 41 42 50 51 60 77 78 79 83 86 87 91 94 98

Another, even more efficient sorting algorithm, quicksort, also works recursively. Watch the following video to get the idea: Quick-sort with Hungarian folk dance.

So, the main idea behind the quicksort algorithm is to find a pivot for each unsorted subset of numbers which splits it into two similarly sized subsets. Then the sorting function calls itself for each subset.

Have a look at the following code:

qsort <- function(v) { if (length(v) > 1) { pivot <- (min(v) + max(v)) / 2 c(qsort(v[v < pivot]), v[v == pivot], qsort(v[v > pivot])) } else v } qsort(v) ## [1] 4 5 9 21 27 29 41 42 50 51 60 77 78 79 83 86 87 91 94 98

Now for some speed comparisons between selection sort, quicksort and R’s sort function (on my lame computer…):

options(expressions = 7500) N <- 7000 set.seed(123) vs <- runif(N) system.time(u <- selsort(vs)) ## user system elapsed ## 0.59 0.14 0.73 system.time(u <- qsort(vs)) ## user system elapsed ## 0.01 0.00 0.01 system.time(u <- sort(vs)) ## user system elapsed ## 0 0 0

Wow, although quicksort was implemented in R (and not in C as the `sort`

function) it is impressively fast. Why? Because each subset of unsorted numbers is again split into two subsets only about half as big. This pushes down the sizes of subsets that still have to be sorted down pretty fast. In the case of selection sort the subset only gets smaller by one number (the smallest one) in each step.

Let us end this post with a little easter egg from google – do you get it?

**leave a comment**for the author, please follow the link and comment on their blog:

**R-Bloggers – Learning Machines**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...