Bubble sort implemented in pure R

[This article was first published on NumberTheory » R stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Please note that this is programming I purely did for the learning experience. The pure R bubble sort implemented in this post is veeeeery slow for two reasons:

  1. Interpreted code with lots of iteration is very slow.
  2. Bubble sort is one of the slowest sorting algorithms (O(N^2))

The bubble sort sorting algorithm works by iterating over the unsorted vector and comparing pairs of numbers. Let’s say the first point pair is c(61, 3), here the numbers need to be swapped as the 3 should be earlier in the sorted vector. The following function returns TRUE if the numbers should be swapped, and returns FALSE otherwise:

larger = function(pair) {
   if(pair[1] > pair[2]) return(TRUE) else return(FALSE)
}

This function is used by the following function:

swap_if_larger = function(pair) {
    if(larger(pair)) {
        return(rev(pair)) 
    } else {
        return(pair)
    }
}

which returns the swapped version of the pair if appropriate, or the original pair if the order is ok. For each point pair (element1-element2, element2-element3, etc) swap_if_larger is called:

swap_pass = function(vec) { 
    for(i in seq(1, length(vec)-1)) {
        vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)])
    }
    return(vec)
}

One pass of this function performs a comparison on all pairs, swapping if necessary. To fully sort the vector, we need to perform multiple passes until no swaps are needed anymore. I chose to implement this using recursion:

bubble_sort = function(vec) {
    new_vec = swap_pass(vec)
    if(isTRUE(all.equal(vec, new_vec))) { 
        return(new_vec) 
    } else {
        return(bubble_sort(new_vec))
    }
}

The function starts by perform a swapping pass over the vector. If the new vector is equal to the old vector, no swaps where needed, i.e. the vector is already sorted. The function than returns the vector. Alternatively, if the vectors are different, the vector is not yet fully sorted, and we need to perform more passes. This is accomplished by recursively calling bubble_sort again on the vector. An example of the function in action:

> test_vec = round(runif(100, 0, 100))
> bubble_sort(test_vec)
  [1]  1  1  6  6  9 10 10 10 13 14 14 15 19 19 20 21 23 24 24 24 26 26 26 26 27
 [26] 28 28 30 31 32 34 35 35 36 36 37 39 39 40 40 40 41 41 41 41 43 43 43 45 46
 [51] 47 51 56 56 57 57 57 58 58 59 61 61 62 63 64 65 68 68 69 70 71 71 72 73 74
 [76] 75 75 75 78 79 82 82 84 85 88 88 89 90 91 91 91 92 92 92 92 93 93 96 96 99
>

The full sorting process is nicely illustrated by the following animated gif (linked from wikipedia):

enter image description here

This implementation is horribly slow:

> system.time(bubble_sort(test_vec))
   user  system elapsed
  0.076   0.000   0.077
> system.time(sort(test_vec))
   user  system elapsed
  0.001   0.000   0.001

Probably implementing the relatively slow bubble sort in a compiled language pose a dramatic increase in speed. Maybe a nice first testcase for Rcpp

To leave a comment for the author, please follow the link and comment on their blog: NumberTheory » R stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)