R Inferno-ism: order is not rank

July 26, 2012
By

(This article was first published on Portfolio Probe » R language, and kindly contributed to R-bloggers)

Do not use order when you want rank.

Background

The update of “A comparison of some heuristic optimization methods” is due to the bug that Luca Scrucca spotted.

Actually, it is two bugs:

    • I used order when I meant rank
    • This somehow escaped being in The R Inferno

 

Problem

What I said in my code was (essentially):

ord <- order(x)

Now what I wanted was the order of the values in x.  What I got was the permutation of indices that would put x into sorted order.  Only under the rarest of circumstances are these the same.  But they sound oh so similar.

What I really wanted to say was:

ord <- rank(x, ties.method="first")

(But see below.)

Timing

Using order in this case doesn’t get us where we want to go.  The advantage is that it gets us there really fast.  The rank function is much slower. (Timings in R version 2.15.0.)

  > x10 <- runif(10)
> system.time(for(i in 1:1e4) order(x10))
   user  system elapsed 
   0.11    0.00    0.11 
> system.time(for(i in 1:1e4) rank(x10, ties.method="first"))
   user  system elapsed 
   1.22    0.00    1.34 
> x100 <- runif(100)
> system.time(for(i in 1:1e4) order(x100))
   user  system elapsed 
   0.14    0.00    0.17 
> system.time(for(i in 1:1e4) rank(x100, ties.method="first"))
   user  system elapsed 
   1.61    0.00    1.64 
> x1000 <- runif(1000)
> system.time(for(i in 1:1e4) order(x1000))
   user  system elapsed 
   1.14    0.02    1.15 
> system.time(for(i in 1:1e4) rank(x1000, ties.method="first"))
   user  system elapsed 
   3.76    0.00    3.82

rank is clearly slower than order. The whole point, though, is that these two commands give us different things.  The command order(order(x)) is another way to get what our rank command gives us.  Even though it is a bit kludgy, it can be significantly faster:

> system.time(for(i in 1:1e4) rank(x10, ties.method="first"))
   user  system elapsed 
   1.39    0.00    1.39 
> system.time(for(i in 1:1e4) order(order(x10)))
   user  system elapsed 
   0.23    0.00    0.24 
> system.time(for(i in 1:1e4) rank(x100, ties.method="first"))
   user  system elapsed 
   1.56    0.00    1.56 
> system.time(for(i in 1:1e4) order(order(x100)))
   user  system elapsed 
   0.36    0.00    0.38 
> system.time(for(i in 1:1e4) rank(x1000, ties.method="first"))
   user  system elapsed 
   3.94    0.00    4.00 
> system.time(for(i in 1:1e4) order(order(x1000)))
   user  system elapsed 
   2.17    0.00    2.17 
> x10000 <- runif(10000)
> system.time(for(i in 1:1e4) rank(x10000, ties.method="first"))
   user  system elapsed 
  34.88    0.00   35.01
> system.time(for(i in 1:1e4) order(order(x10000)))
   user  system elapsed 
  29.51    0.00   29.94

 

To leave a comment for the author, please follow the link and comment on his blog: Portfolio Probe » R language.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.