Bubble sorting in R, C++ and Julia: code improvements and the R compiler

December 28, 2013
By

(This article was first published on NumberTheory » R stuff, and kindly contributed to R-bloggers)

In the past few months I have written posts about implementing the bubble sort algorithm in different languages. In the mean while I have gotten some feedback and suggestions regarding improvements to the implementation I made, see the end of the post for the new source code of the algorithms. These often had a quite profound effect on performance.

One of the best tips was to use the R compiler, i.e. the compiler package which is now part of the standard R distribution. This works by simply calling cmpfun:

function_compiled = cmpfun(function)

The following table presents the timings in microseconds of the different algorithms:

min         median          max 
                    bubble_sort 1506985.326   1571561.4795  1667417.009 
           bubble_sort_compiled  308573.445    322742.5950   350686.185 
          bubble_sort_recursive 1524428.777   1579060.6400  1692169.993 
 bubble_sort_recursive_compiled 1536280.034   1589214.5010  1660944.670 
                bubble_sort_cpp     791.354       821.7165     1170.818 
                     sort_julia     958.000       1009.000     1092.000 
                           sort     105.771       170.8530      454.844

The following things I find striking:

  1. Compiling the for/while loop based R solution benefits massively form the compiler package, increasing the speed almost 5 fold.
  2. Compiling the recursive R based solution does not yield any improvements.
  3. The C++ solution is obviously much faster than any R based solution, increasing the speed between 1900 and 400 times.
  4. The Julia based solution is almost as fast as the C++ solution, which is very impressive for a high-level interpreted language.
  5. The native R sort is almost 8 times faster than the fastest bubble sort in C++ and Julia, but sort might use a different algorithm.

R (recursive implementation). No improvements.

larger = function(pair) {
   if(pair[1] > pair[2]) return(TRUE) else return(FALSE)
}

swap_if_larger = function(pair) {
    if(larger(pair)) {
        return(rev(pair)) 
    } else {
        return(pair)
    }
}

swap_pass = function(vec) { 
    for(i in seq(1, length(vec)-1)) {
        vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)])
    }
    return(vec)
}

bubble_sort_recursive = function(vec) {
    new_vec = swap_pass(vec)
    if(isTRUE(all.equal(vec, new_vec))) { 
        return(new_vec) 
    } else {
        return(bubble_sort(new_vec))
    }
}

R (for/while loop implementation). This implementation was previously not present.

bubble_sort = function(vec) {
    no_passes = 0
    while(1) {
        no_swaps = 0
        for (j in 1 : (length(vec) - 1 - no_passes)) {
            if (vec[j] > vec[j + 1]) {
                s = vec[j]
                vec[j] = vec[j+1]
                vec[j+1] = s
                no_swaps = no_swaps + 1
            }
        }
        no_passes = no_passes + 1
        if(no_swaps == 0) break
    }
    vec
}

C++ (linked into R using Rcpp/inline). Precomputing vec.size() outside the loop improved performance by a factor of 2.

require(inline)  ## for cxxfunction()

src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in);
       double tmp = 0;
       int n = vec.size();
       int no_swaps;
       int passes;
       passes = 0;
       while(true) {
           no_swaps = 0;
           for (int i = 0; i < n - 1 - passes; ++i) {
               if(vec[i] > vec[i+1]) {
                   no_swaps++;
                   tmp = vec[i];
                   vec[i] = vec[i+1];
                   vec[i+1] = tmp;
               };
           };
           if(no_swaps == 0) break;
           passes++;
       };
       return(vec);'
bubble_sort_cpp = cxxfunction(signature(vec_in = "numeric"), body=src, plugin="Rcpp")

Julia. Subtly changing the definition of the for loop (1:(length(vec_in) - 1 - passes) vs [1:(length(vec_in) - 1 - passes)]) improved performance two fold.

function swap_vector(vec_in, passes)
    no_swaps = 0
    for vec_index in 1:(length(vec_in) - 1 - passes)
        if vec_in[vec_index] > vec_in[vec_index + 1]
            no_swaps += 1
            tmp = vec_in[vec_index]
            vec_in[vec_index] = vec_in[vec_index + 1]
            vec_in[vec_index + 1] = tmp
        end
    end
    return no_swaps
end

function sort_julia(vec_in)
    passes = 0
    while(true)
        no_swaps = swap_vector(vec_in, passes)
        if no_swaps == 0 
            break
        end
        passes += 1
    end
    return vec_in
end

To leave a comment for the author, please follow the link and comment on his blog: NumberTheory » R stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.