**NumberTheory » R stuff**, and kindly contributed to R-bloggers)

In the past few months I have written posts about implementing the bubble sort algorithm in different languages. In the mean while I have gotten some feedback and suggestions regarding improvements to the implementation I made, see the end of the post for the new source code of the algorithms. These often had a quite profound effect on performance.

One of the best tips was to use the R compiler, i.e. the `compiler`

package which is now part of the standard R distribution. This works by simply calling `cmpfun`

:

function_compiled = cmpfun(function)

The following table presents the timings in microseconds of the different algorithms:

min median max bubble_sort 1506985.326 1571561.4795 1667417.009 bubble_sort_compiled 308573.445 322742.5950 350686.185 bubble_sort_recursive 1524428.777 1579060.6400 1692169.993 bubble_sort_recursive_compiled 1536280.034 1589214.5010 1660944.670 bubble_sort_cpp 791.354 821.7165 1170.818 sort_julia 958.000 1009.000 1092.000 sort 105.771 170.8530 454.844

The following things I find striking:

- Compiling the for/while loop based R solution benefits massively form the compiler package, increasing the speed almost 5 fold.
- Compiling the recursive R based solution does not yield any improvements.
- The C++ solution is obviously much faster than any R based solution, increasing the speed between 1900 and 400 times.
- The Julia based solution is almost as fast as the C++ solution, which is very impressive for a high-level interpreted language.
- The native R
`sort`

is almost 8 times faster than the fastest bubble sort in C++ and Julia, but`sort`

might use a different algorithm.

R (recursive implementation). No improvements.

larger = function(pair) { if(pair[1] > pair[2]) return(TRUE) else return(FALSE) } swap_if_larger = function(pair) { if(larger(pair)) { return(rev(pair)) } else { return(pair) } } swap_pass = function(vec) { for(i in seq(1, length(vec)-1)) { vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)]) } return(vec) } bubble_sort_recursive = function(vec) { new_vec = swap_pass(vec) if(isTRUE(all.equal(vec, new_vec))) { return(new_vec) } else { return(bubble_sort(new_vec)) } }

R (for/while loop implementation). This implementation was previously not present.

bubble_sort = function(vec) { no_passes = 0 while(1) { no_swaps = 0 for (j in 1 : (length(vec) - 1 - no_passes)) { if (vec[j] > vec[j + 1]) { s = vec[j] vec[j] = vec[j+1] vec[j+1] = s no_swaps = no_swaps + 1 } } no_passes = no_passes + 1 if(no_swaps == 0) break } vec }

C++ (linked into R using Rcpp/inline). Precomputing `vec.size()`

outside the loop improved performance by a factor of 2.

require(inline) ## for cxxfunction() src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in); double tmp = 0; int n = vec.size(); int no_swaps; int passes; passes = 0; while(true) { no_swaps = 0; for (int i = 0; i < n - 1 - passes; ++i) { if(vec[i] > vec[i+1]) { no_swaps++; tmp = vec[i]; vec[i] = vec[i+1]; vec[i+1] = tmp; }; }; if(no_swaps == 0) break; passes++; }; return(vec);' bubble_sort_cpp = cxxfunction(signature(vec_in = "numeric"), body=src, plugin="Rcpp")

Julia. Subtly changing the definition of the for loop (`1:(length(vec_in) - 1 - passes)`

vs `[1:(length(vec_in) - 1 - passes)]`

) improved performance two fold.

function swap_vector(vec_in, passes) no_swaps = 0 for vec_index in 1:(length(vec_in) - 1 - passes) if vec_in[vec_index] > vec_in[vec_index + 1] no_swaps += 1 tmp = vec_in[vec_index] vec_in[vec_index] = vec_in[vec_index + 1] vec_in[vec_index + 1] = tmp end end return no_swaps end function sort_julia(vec_in) passes = 0 while(true) no_swaps = swap_vector(vec_in, passes) if no_swaps == 0 break end passes += 1 end return vec_in end

**leave a comment**for the author, please follow the link and comment on his blog:

**NumberTheory » R stuff**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...