Bubble sorting in R, C++ and Julia: code improvements and the R compiler
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the past few months I have written posts about implementing the bubble sort algorithm in different languages. In the mean while I have gotten some feedback and suggestions regarding improvements to the implementation I made, see the end of the post for the new source code of the algorithms. These often had a quite profound effect on performance.
One of the best tips was to use the R compiler, i.e. the compiler package which is now part of the standard R distribution. This works by simply calling cmpfun:
function_compiled = cmpfun(function)
The following table presents the timings in microseconds of the different algorithms:
min median max
bubble_sort 1506985.326 1571561.4795 1667417.009
bubble_sort_compiled 308573.445 322742.5950 350686.185
bubble_sort_recursive 1524428.777 1579060.6400 1692169.993
bubble_sort_recursive_compiled 1536280.034 1589214.5010 1660944.670
bubble_sort_cpp 791.354 821.7165 1170.818
sort_julia 958.000 1009.000 1092.000
sort 105.771 170.8530 454.844
The following things I find striking:
- Compiling the for/while loop based R solution benefits massively form the compiler package, increasing the speed almost 5 fold.
- Compiling the recursive R based solution does not yield any improvements.
- The C++ solution is obviously much faster than any R based solution, increasing the speed between 1900 and 400 times.
- The Julia based solution is almost as fast as the C++ solution, which is very impressive for a high-level interpreted language.
- The native R
sortis almost 8 times faster than the fastest bubble sort in C++ and Julia, butsortmight use a different algorithm.
R (recursive implementation). No improvements.
larger = function(pair) {
if(pair[1] > pair[2]) return(TRUE) else return(FALSE)
}
swap_if_larger = function(pair) {
if(larger(pair)) {
return(rev(pair))
} else {
return(pair)
}
}
swap_pass = function(vec) {
for(i in seq(1, length(vec)-1)) {
vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)])
}
return(vec)
}
bubble_sort_recursive = function(vec) {
new_vec = swap_pass(vec)
if(isTRUE(all.equal(vec, new_vec))) {
return(new_vec)
} else {
return(bubble_sort(new_vec))
}
}
R (for/while loop implementation). This implementation was previously not present.
bubble_sort = function(vec) {
no_passes = 0
while(1) {
no_swaps = 0
for (j in 1 : (length(vec) - 1 - no_passes)) {
if (vec[j] > vec[j + 1]) {
s = vec[j]
vec[j] = vec[j+1]
vec[j+1] = s
no_swaps = no_swaps + 1
}
}
no_passes = no_passes + 1
if(no_swaps == 0) break
}
vec
}
C++ (linked into R using Rcpp/inline). Precomputing vec.size() outside the loop improved performance by a factor of 2.
require(inline) ## for cxxfunction()
src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in);
double tmp = 0;
int n = vec.size();
int no_swaps;
int passes;
passes = 0;
while(true) {
no_swaps = 0;
for (int i = 0; i < n - 1 - passes; ++i) {
if(vec[i] > vec[i+1]) {
no_swaps++;
tmp = vec[i];
vec[i] = vec[i+1];
vec[i+1] = tmp;
};
};
if(no_swaps == 0) break;
passes++;
};
return(vec);'
bubble_sort_cpp = cxxfunction(signature(vec_in = "numeric"), body=src, plugin="Rcpp")
Julia. Subtly changing the definition of the for loop (1:(length(vec_in) - 1 - passes) vs [1:(length(vec_in) - 1 - passes)]) improved performance two fold.
function swap_vector(vec_in, passes)
no_swaps = 0
for vec_index in 1:(length(vec_in) - 1 - passes)
if vec_in[vec_index] > vec_in[vec_index + 1]
no_swaps += 1
tmp = vec_in[vec_index]
vec_in[vec_index] = vec_in[vec_index + 1]
vec_in[vec_index + 1] = tmp
end
end
return no_swaps
end
function sort_julia(vec_in)
passes = 0
while(true)
no_swaps = swap_vector(vec_in, passes)
if no_swaps == 0
break
end
passes += 1
end
return vec_in
end
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.