More bubble sort tuning
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After last week’s post bubble sort tuning I got an email from Berend Hasselman noting that my ‘best’ function did not protect against cases n<=2 and a speed improvement was possible. That made me realize that I should have been profiling the functions rather than just look for inspiration in the code. So, that's what I wanted to do this week. Unfortunately the examples were a bit too simple to see good power of profiling. So, after wring a too long post, removed a lot again.
Simple Improvements
The first thing I did was compare Berend’s code, bubble_sort6, with mine own, bubble_sort5. The difference is, I thought, in the indexing, the double [ ][ ]. And it was 10% to 20% faster.
library(microbenchmark)
bubble_sort5 = function(vec) {
wm <- which.max(vec)
vec <- c(vec[-wm],vec[wm])
for (iend in ((length(vec)-1):2)) {
wm <- which.max(vec[1:iend])
vec <- c(vec[1:iend][-wm],vec[1:iend][wm],vec[(iend+1):length(vec)])
}
vec
}
bubble_sort6 = function(vec) {
wm <- which.max(vec)
vec <- c(vec[-wm],vec[wm])
for (iend in ((length(vec)-1):2)) {
tmp <- vec[1:iend]
wm <- which.max(tmp)
vec <- c(tmp[-wm],tmp[wm],vec[(iend+1):length(vec)])
}
vec
}
So how about removing more indices? I can copy the vector, push the maximum in the correct space and eliminate from the original vector.
bubble_sort7 = function(vec) {
final <- vec
for (iend in (length(vec):1)) {
wm <- which.max(vec)
final[iend] <- vec[wm]
vec <- vec[-wm]
}
final
}
Why not eliminate the removal from the vector and place a low number there, so which.max skips those?
bubble_sort8 = function(vec) {
final <- vec
smallest <- min(vec)
for (iend in (length(vec):1)) {
wm <- which.max(vec)
final[iend] <- vec[wm]
vec[wm] <- smallest
}
final
}
Profiling
The classic way to profile is using rprof(). I checked R-bloggers and found improved R profiling summaries by Noam Ross, from which I concluded that the only improvements were on output. I will use the classic output. This seems to work best, maybe that’s because I work from Eclipse, maybe because I still use R 2.15.3, but other summaries did not work so well. It also seems 17% of the time is consumed by which.max and 80% by processes which are not registered by the profiler. Not very informative, but that may be because the example is too simple.
Rprof(“file.out”,interval=.002)
for(i in 1:100) bubble_sort7(sample(1000))
Rprof(NULL)
summaryRprof(“file.out”)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.