Site icon R-bloggers

Quicksort speed, just in time compiling and vectorizing

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was reading the Julia documentation the other day. They do speed comparisons to other languages. Obviously R does not come out very well. The R code for quicksort is here and I noticed it was not vectorized at all. So I wondered if it could be improved. A quick check on wikipedia showed that the algorithm displayed by wikipedia is the one used by the Julia team. By abandoning this structure and using a just in time compiler some extra speed can be achieved. Additional bonus is that the algorithm actually got more simple. It should be noted (wikipedia)  ‘Quicksort can be implemented with an in-place partitioning algorithm, so the entire sort can be done with only O(log n) additional space’. Such an implementation was used by the Julia team, however profiling seems to show that limited additional space was not achieved with the R implementation.

Algorithm

To quote wikipedia again:

Or, in roughly in R
wfqs1 <- function(x) {
  if (length(x)<2) return(x)
  pivot <- x[sample(length(x),1)]
  c(wfqs1(x[x<pivot]),x[x==pivot],wfqs1(x[x>pivot]))
}
The implementation by the Julia team does do the in place reordering:
qsort = function(a) {
  qsort_kernel = function(lo, hi) {
    i = lo
    j = hi
    while (i < hi) {
      pivot = a[floor((lo+hi)/2)]
      while (i <= j) {
        while (a[i] < pivot) i = i + 1
        while (a[j] > pivot) j = j – 1
        if (i <= j) {
          t = a[i]
          a[i] <<- a[j]
          a[j] <<- t
          i = i + 1;
          j = j – 1;
        }
      }
      if (lo < j) qsort_kernel(lo, j)
      lo = i
      j = hi
    }
  }
  qsort_kernel(1, length(a))
  return(a)
}

Variations in implementation

After coding the first version, it was tried to make faster variations. The first three version are intended to have less comparisons than wfqs1(). The last one is intended to build a bit more vectorizing in qsort().

wfqs2 <- function(x) {
  if (length(x)<2) return(x)
  ipivot <- sample(length(x),1)
  pivot <- x[ipivot]
  c(wfqs2(x[x<pivot]),pivot,wfqs2(x[-ipivot][x[-ipivot]>=pivot]))
}
wfqs3 <- function(x) {
  if (length(x)<2) return(x)
  ipivot <- sample(length(x),1)
  pivot <- x[ipivot]
  split <- x<pivot
  split2 <- !split
  split2[ipivot] <- FALSE
  c(wfqs3(x[split]),pivot,wfqs3(x[split2]))
}
wfqs4 <- function(x) {
  if (length(x)<2) return(x)
  split <- x<x[1]
  c(wfqs4(x[split]),x[1],wfqs4(x[!split][-1]))
}

wfqsx = function(a) {
  qsort_kernel = function(lo, hi) {
    if(lo>=hi) return()
    if(hi-lo==1) {
      if(a[lo]>a[hi]) {
        t <- a[lo]
        a[lo] <<- a[hi]
        a[hi] <<- t
      }
      return()
    }
    goUp <- a[(lo+1):hi]>a[lo]
    up <- which(goUp)
    do <- which(!goUp)
    pivottarget <- lo+length(do)
    up <- up[up<=length(do)]+lo
    do <- do[do>length(do)]+lo
    t <- a[do]
    a[do] <<- a[up]
    a[up] <<- t
    t <- a[pivottarget]
    a[pivottarget] <<- a[lo]
    a[lo] <<- t  
    qsort_kernel(lo,pivottarget-1)
    qsort_kernel(pivottarget+1, hi)
  }
  qsort_kernel(1, length(a))
  return(a)
}

Analysis

Speed without JIT

The first figure shows that sorting time is roughly equivalent to the number of items to be sorted. The exception there is base:sort() where around 100 items there is no size effect. Even though not very pronounced on this scale, qsort() is slowest and wfqs4() is fastest after base:sort().

The second figure shows speed relative to base:sort(). This shows that qsort() is takes approximately 6 times as much time as wfqs4().

Profiling

Profiling seems to show that time is spend because it is spend. Most of it is not spend on any function which Rprof() registers. Regarding memory usage, if mem.total is anything to go by, qsort() uses quite more than wfqs4().

#####     qsort    #############################################
$by.self
               self.time self.pct total.time total.pct mem.total
“qsort_kernel”      6.72    86.15       7.80    100.00    1671.4
“<GC>”              0.68     8.72       0.68      8.72     112.2
“+”                 0.16     2.05       0.16      2.05      31.5
“<”                 0.12     1.54       0.12      1.54      21.5
“-”                 0.06     0.77       0.06      0.77       5.4
“floor”             0.04     0.51       0.04      0.51      10.2
“<=”                0.02     0.26       0.02      0.26       5.3

$by.total
               total.time total.pct mem.total self.time self.pct
“qsort_kernel”       7.80    100.00    1671.4      6.72    86.15
“<Anonymous>”        7.80    100.00    1671.4      0.00     0.00
“eval”               7.80    100.00    1671.4      0.00     0.00
“qsort”              7.80    100.00    1671.4      0.00     0.00
“<GC>”               0.68      8.72     112.2      0.68     8.72
“+”                  0.16      2.05      31.5      0.16     2.05
“<”                  0.12      1.54      21.5      0.12     1.54
“-”                  0.06      0.77       5.4      0.06     0.77
“floor”              0.04      0.51      10.2      0.04     0.51
“<=”                 0.02      0.26       5.3      0.02     0.26

$sample.interval
[1] 0.02

$sampling.time
[1] 7.8

#####    wfqs4    ###########################################
$by.self
            self.time self.pct total.time total.pct mem.total
“wfqs4”          0.98    75.38       1.30    100.00     258.0
“<”              0.10     7.69       0.12      9.23      26.2
“<GC>”           0.08     6.15       0.08      6.15      11.8
“c”              0.06     4.62       0.08      6.15      12.0
“!”              0.04     3.08       0.04      3.08       9.0
“-”              0.02     1.54       0.02      1.54       4.6
“.External”      0.02     1.54       0.02      1.54       0.0

$by.total
              total.time total.pct mem.total self.time self.pct
“wfqs4”             1.30    100.00     258.0      0.98    75.38
“<Anonymous>”       1.30    100.00     258.0      0.00     0.00
“eval”              1.30    100.00     258.0      0.00     0.00
“<”                 0.12      9.23      26.2      0.10     7.69
“<GC>”              0.08      6.15      11.8      0.08     6.15
“c”                 0.08      6.15      12.0      0.06     4.62
“!”                 0.04      3.08       9.0      0.04     3.08
“-”                 0.02      1.54       4.6      0.02     1.54
“.External”         0.02      1.54       0.0      0.02     1.54
“runif”             0.02      1.54       0.0      0.00     0.00

$sample.interval
[1] 0.02

$sampling.time
[1] 1.3

JIT compiling

The JIT is a great equalizer. Everybody profits with the exception of base:sort(). It is kind of obvious base:sort() does not profit from the JIT, the part that does the work should be in machine code. In the R code, it seems the various implementations are much closer to each other. Small refinements are apparently swamped by the JIT. Nevertheless, wfsq4() is still fastest after base:sort(). It takes half the time of qsort(). 




Discussion

It cannot be stated enough: When programming in R, vectorize. Vectors make compact code. Vectors are efficient; a vectorized calculation which does more actual computations can beat complex loops which avoid computations. The speed effect is reduced when the JIT is used. Starting the JIT then should be the first step when an R algorithm is too slow. The second step is starting the profiler and looking for (vectorized) optimization. When these two fail it is time to roll out the big guns; (inline) C code, faster computer or switching to a faster language altogether.

Additonal code

la <- lapply(1:15,function(i) {
      n <- 10*2^i
      rr <- runif(n)
      mb <- microbenchmark(
          wfqs1(rr),
          wfqs2(rr),
          wfqs3(rr),
          wfqs4(rr),
          wfqsx(rr),
          sort(rr),
          qsort(rr),
          times=10)
      ag <- aggregate(mb$time,list(method=mb$expr),median)
      ag$x <- ag$x/1000
      ag$n <- n
      ag
    })
res <- do.call(rbind,la)
png(‘qs1.nc.png’)
p <- ggplot(res, aes(y=x, x=n,col=method))
p +geom_line() +
    scale_y_log10(‘Median sorting time (ms)’) +
    scale_x_log10() +
    labs(x=’Number of items’,title=’No compiling’ ) 
dev.off()

resref <- subset(res,res$method==’sort(rr)’,-method)
names(resref)[names(resref)==’x’] <- ‘tsort’ 
res2 <- merge(res,resref)
res2$reltime <- res2$x/res2$tsort
png(‘qs2.nc.png’)
p <- ggplot(res2, aes(y=reltime, x=n,col=method))
p +geom_line() +
    scale_x_log10() +
    labs(x=’Number of items’,
        y=’Median relative sorting time’,
        title=’No compiling’ ) 
dev.off()

Rprof(“Rprofqs.out”,memory.profiling = TRUE, gc.profiling = TRUE)
y = qsort(runif(100000))
Rprof(NULL)
Rprof(“Rprofw4.out”,memory.profiling = TRUE, gc.profiling = TRUE)
y = wfqs4(runif(100000))
Rprof(NULL)

summaryRprof(filename = “Rprofqs.out”,memory=’both’)
summaryRprof(filename = “Rprofw4.out”,memory=’both’)

require(compiler)
enableJIT(3)
la <- lapply(1:15,function(i) {
      n <- 10*2^i
      rr <- runif(n)
      mb <- microbenchmark(
          wfqs1(rr),
          wfqs2(rr),
          wfqs3(rr),
          wfqs4(rr),
          wfqsx(rr),
          sort(rr),
          qsort(rr),
          times=10)
      ag <- aggregate(mb$time,list(method=mb$expr),median)
      ag$x <- ag$x/1000
      ag$n <- n
      ag
    })
res <- do.call(rbind,la)
enableJIT(0)
png(‘qs1.jit3.png’)
p <- ggplot(res, aes(y=x, x=n,col=method))
p +geom_line() +
    scale_y_log10(‘Median sorting time (ms)’) +
    scale_x_log10() +
    labs(x=’Number of items’,title=’Enable JIT 3′ ) 
dev.off()

resref <- subset(res,res$method==’sort(rr)’,-method)
names(resref)[names(resref)==’x’] <- ‘tsort’ 
res2 <- merge(res,resref)
res2$reltime <- res2$x/res2$tsort
png(‘qs2.jit3.png’)
p <- ggplot(res2, aes(y=reltime, x=n,col=method))
p +geom_line() +
    scale_x_log10() +
    labs(x=’Number of items’,
        y=’Median relative sorting time’,
        title=’Enable JIT 3′ ) 
dev.off()

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.