Going over the speed limit

April 17, 2011
By

(This article was first published on eKonometrics, and kindly contributed to R-bloggers)

In an earlier post [Speeding tickets for R and Stata]  I had reported on how R compared with Stata for executing algorithms involving maximum likelihood estimation. This post  offers the following updates on the last post:

  • Stata is in fact even faster than previously reported.
  • The 64-bit version of the newly released R 2.13.0 reports faster times than the 32-bit version for R 2.12.0.
  • Limdep/NLogit, a popular econometrics software amongst discrete choice modellers, reported slower execution times than R 2.13 (64-bit version).
  • Advice on speed from experienced R users.

    My data set (used for the test results reported below) comprised an ordinal dependant variable [5 categories] and categorical explanatory variables with 63,122 observations. I used a computer running Windows 7 Professional on Intel Core 2 Quad CPU Q9300 @ 2.5 GHz with 8 GB of RAM. Further details about the tests are listed in the following Table.

    Software Routines

    Stata 11 (duo core)

    R (2.12.0) [32-bit]

    R x64 2.13.0

    NLogit/Limdep

    Commercial license price

    US$2,495 Free Free $1,395

    Multinomial Logit

    mlogit, 9.06 seconds 
       (2.89 seconds with 
       the “quietly” option")
    multinom, 50.59 sec + 52.29 sec
    zelig (mlogit), 77.89 sec
    VGLM (multinomial), 64.4 sec
    multinom, 32.7 sec + 49.8 sec
    zelig (mlogit), 69.92 sec
    VGLM (multinomial), 63.76 sec
    Logit; 36.72 sec

    Proportional odds model

    ologit, 1.69 sec
              0.91 sec [quietly]
    oprobit, 0.91 sec [quietly]
    VGLM (parallel = T), 16.26 sec
    polr, 22.62 sec [o.logit]
    VGLM (parallel = T), 14.94 sec
    polr, 13.49 sec [o.logit]
    polr, 14.94 sec [o.probit]
    Ordered [Logit] 18.50 sec
    Ordered [Probit] 36.33 sec

    Generalized Logit

    gologit2, 18.67 sec
    (15.1 seconds with 
       the “quietly” option")
    VGLM (parallel = F), 64.71 sec
    VGLM (parallel = F),  64.86 sec
     

     

    Stata is even faster

    When I reran the models using the quietly option (which supresses terminal output ) in Stata, I obtained the actual algorithm convergence times. For the multinomial logit model, Stata took fewer than 3 seconds to converge, making it 10-times faster than R. Similar reductions in execution times for Stata were observed for other algorithms reported in the table above.

    64-bit version of R is faster, sometimes

    The 64-bit version of R (2.13.0) reported faster execution times. The same was observed for the 64-bit version of R (2.12.0). Notice in the table above the dramatic reduction in the convergence times for the multinomial logit model (using multinom). R 2.13.0 [64-bit] took 35.4% less time to converge than R 2.12.0 [32-bit]. However, Zelig and VGLM based algorithms reported very modest improvements in execution times.

    The ordered logit and ordered probit models (executed using the polr algorithm) also reported significant improvements in execution times.The ordered logit model took 40.3% less time in converging for R 2.13.0 [64-bit] than R 2.12.0 [32-bit].

    I still do not understand why the summary(multinomial logit model) still takes an additional 49.8 seconds on top of 32.7 seconds to report summary results for the multinomial logit model. When I do not use summary() and instead use coef(multinomial logit model), I get instantaneous output.

    In summary, it appears that not all algorithms would converge faster in the updated 64-bit version of R 2.13.0.

    R is faster than Limdep/NLogit

    In comparison, R [2.13.0] offered faster convergence times than NLogit for multinomial and ordered logit models and for ordered probit models. This puts R in the middle of two popular econometrics software. Stata is significantly faster than R, and R offers faster execution times than NLogit (see the difference for ordered logit in the table above).

    What R Pros are saying about my post

    If you were to scroll down to the comments section of my last post [Speeding tickets for R and Stata], you’ll notice some advice from experienced users of R. I have been advised to re-run the tests by first obtaining the optimised version of BLAS and LAPACK libraries.  I am not sure how much difference would that make. However, it would be a little difficult for ordinary users of R (such as myself) to be able to determine what BLAS and LAPACK libraries to choose and install that are appropriate for their computer systems.

    If significant speed gains could be achieved by using optimised BLAS and LAPACK libraries, the R installation routines may then be improved so that these libraries are made available to the novice end users of R.

  • To leave a comment for the author, please follow the link and comment on his blog: eKonometrics.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



    If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

    Comments are closed.