Much more efficient bubble sort in R using the Rcpp and inline packages

May 14, 2013
By

(This article was first published on NumberTheory » R stuff, and kindly contributed to R-bloggers)

Recently I wrote a blogpost showing the implementation of a simple bubble sort algorithm in pure R code. The downside of that implementation was that is was awfully slow. And by slow, I mean really slow, as in “a 100 element vector takes 7 seconds to sort”-slow. One of the major opportunities for a speed is to start using a compiled language. I chose to use C++ as this is really easy to integrate into R using the Rcpp package. In addition to the Rcpp package I use the inline package which allows one to use C++ code and R code in a seamless fashion. The following code creates an R function bubble_sort_cpp:

require(inline)  ## for cxxfunction()                                                       
                                                                                            
src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in);                               
       double tmp = 0;                                                                      
       int no_swaps;                                                                        
       while(true) {                                                                        
           no_swaps = 0;                                                                    
           for (int i = 0; i < vec.size()-1; ++i) {                                         
               if(vec[i] > vec[i+1]) {                                                      
                   no_swaps++;                                                              
                   tmp = vec[i];                                                            
                   vec[i] = vec[i+1];                                                       
                   vec[i+1] = tmp;                                                          
               };                                                                           
           };                                                                               
           if(no_swaps == 0) break;                                                         
       };                                                                                   
       return(vec);'                                                                        
bubble_sort_cpp = cxxfunction(signature(vec_in = "numeric"), body=src, plugin="Rcpp")

Quite amazing how easy it is to integrate R code and C++ code. inline compiles and links the C++ code on-the-fly, creating an R function that delivers the functionality. Of course the most important question is now how fast this is. I use the microbenchmark package to run the bubble sort I implemented in pure R (here), the bubble sort implemented in C++ (see above), and the standard R sorting algorithm:

library(microbenchmark) 
vector_size = 100                                                                           
print(microbenchmark(bubble_sort(sample(1:vector_size)),                                    
               bubble_sort_cpp(sample(1:vector_size)),                                      
               sort(sample(1:vector_size))))  
                                   expr       min         lq     median
     bubble_sort(sample(1:vector_size)) 67397.546 74358.9495 78143.0710
 bubble_sort_cpp(sample(1:vector_size))    44.895    55.9340    60.4930
            sort(sample(1:vector_size))    44.173    48.1315    62.3785
         uq        max neval
 81285.0215 105626.483   100 
    63.7715     74.643   100 
    67.2375    138.069   100

These results speak for itself, the C++ version is more than 1300 times faster when looking at the median speed, even faster than the built-in sort function. These differences will only get more pronounced when the size of the vector grows.

To leave a comment for the author, please follow the link and comment on his blog: NumberTheory » R stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.