More bubble sort tuning

May 26, 2013
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

After last week's post bubble sort tuning I got an email from Berend Hasselman noting that my 'best' function did not protect against cases n<=2 and a speed improvement was possible. That made me realize that I should have been profiling the functions rather than just look for inspiration in the code. So, that's what I wanted to do this week. Unfortunately the examples were a bit too simple to see good power of profiling. So, after wring a too long post, removed a lot again.

Simple Improvements 

The first thing I did was compare Berend's code, bubble_sort6, with mine own, bubble_sort5. The difference is, I thought, in the indexing, the double [ ][ ].  And it was 10% to 20% faster.
library(microbenchmark)

bubble_sort5 = function(vec) {
  wm <- which.max(vec)
  vec <- c(vec[-wm],vec[wm])
  for (iend in ((length(vec)-1):2)) {
    wm <- which.max(vec[1:iend])
    vec <- c(vec[1:iend][-wm],vec[1:iend][wm],vec[(iend+1):length(vec)])
  }
  vec
}

bubble_sort6 = function(vec) {
  wm <- which.max(vec)
  vec <- c(vec[-wm],vec[wm])
  for (iend in ((length(vec)-1):2)) {
    tmp <- vec[1:iend]
    wm <- which.max(tmp)
    vec <- c(tmp[-wm],tmp[wm],vec[(iend+1):length(vec)])
  }
  vec
}
So how about removing more indices? I can copy the vector, push the maximum in the correct space and eliminate from the original vector.
bubble_sort7 = function(vec) {
  final <- vec
  for (iend in (length(vec):1)) {
    wm <- which.max(vec)
    final[iend] <- vec[wm]
    vec <- vec[-wm]
  }
  final
}
Why not eliminate the removal from the vector and place a low number there, so which.max skips those?
bubble_sort8 = function(vec) {
  final <- vec
  smallest <- min(vec)
  for (iend in (length(vec):1)) {
    wm <- which.max(vec)
    final[iend] <- vec[wm]
    vec[wm] <- smallest
  }
  final
}
The speed of number 7 and 8 are highest. Number 8 at vector_size = 1000. (To make the display compacted, the expression is slightly shortened).                                                                             
vector_size = 1000
mbm <- microbenchmark(bubble_sort5(sample(vector_size)),
    bubble_sort6(sample(vector_size)),
    bubble_sort7(sample(vector_size)),
    bubble_sort8(sample(vector_size)),
    sort(sample(vector_size)),
    sample(vector_size))  
su <- summary(mbm)
levels(su$expr) <- gsub('sample(vector_size)','',levels(su$expr),fixed=TRUE)
su

            expr       min         lq     median         uq        max neval
1 bubble_sort5() 58382.570 69967.8920 70629.8475 71088.0120  85883.413   100
2 bubble_sort6() 42090.254 43020.5110 54414.1360 55058.8650  74350.140   100
3 bubble_sort7() 27909.892 28862.5065 29183.9545 39707.0685 241396.777   100
4 bubble_sort8() 24635.301 25061.5775 25356.6345 36344.5100  40485.581   100
5         sort()   321.081   503.2475   522.6735   534.7695    641.430   100
6                   31.521    38.8525    48.3830    50.5815     53.514   100
but number 7 is fastest at vector.size=100. 
            expr      min       lq    median       uq       max neval
1 bubble_sort5() 1792.338 1836.322 1880.6720 1928.322  2313.546   100
2 bubble_sort6() 1392.819 1416.644 1454.7630 1536.499 28460.422   100
3 bubble_sort7() 1364.230 1408.579 1433.5045 1507.177  1894.967   100
4 bubble_sort8() 1724.896 1773.279 1808.8320 1881.038 26898.999   100
5         sort()  177.401  258.405  336.8425  364.332   599.645   100
6                   5.864    9.530   10.9960   13.562    20.526   100

Profiling

The classic way to profile is using rprof(). I checked R-bloggers and found improved R profiling summaries by Noam Ross, from which I concluded that the only improvements were on output. I will use the classic output. This seems to work best, maybe that's because I work from Eclipse, maybe because I still use R 2.15.3, but other summaries did not work so well. It also seems 17% of the time is consumed by which.max and 80% by processes which are not registered by the profiler. Not very informative, but that may be because the example is too simple.
Rprof("file.out",interval=.002)
for(i in 1:100) bubble_sort7(sample(1000))
Rprof(NULL)
summaryRprof("file.out")
$by.self
             self.time self.pct total.time total.pct
bubble_sort7     2.704    81.30      3.322     99.88
which.max        0.594    17.86      0.594     17.86
-                0.020     0.60      0.020      0.60
sample           0.004     0.12      0.004      0.12
$<-              0.002     0.06      0.002      0.06
parse            0.002     0.06      0.002      0.06

$by.total
             total.time total.pct self.time self.pct
<Anonymous>       3.326    100.00     0.000     0.00
bubble_sort7      3.322     99.88     2.704    81.30
eval              3.322     99.88     0.000     0.00
which.max         0.594     17.86     0.594    17.86
-                 0.020      0.60     0.020     0.60
sample            0.004      0.12     0.004     0.12
doTryCatch        0.004      0.12     0.000     0.00
tryCatchList      0.004      0.12     0.000     0.00
tryCatchOne       0.004      0.12     0.000     0.00
$<-               0.002      0.06     0.002     0.06
parse             0.002      0.06     0.002     0.06
srcfilecopy       0.002      0.06     0.000     0.00

$sample.interval
[1] 0.002

$sampling.time
[1] 3.326

To leave a comment for the author, please follow the link and comment on his blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.