In an earlier post [Speeding tickets for R and Stata] I had reported on how R compared with Stata for executing algorithms involving maximum likelihood estimation. This post offers the following updates on the last post:
My data set (used for the test results reported below) comprised an ordinal dependant variable [5 categories] and categorical explanatory variables with 63,122 observations. I used a computer running Windows 7 Professional on Intel Core 2 Quad CPU Q9300 @ 2.5 GHz with 8 GB of RAM. Further details about the tests are listed in the following Table.
Software Routines |
Stata 11 (duo core) |
R (2.12.0) [32-bit] |
R x64 2.13.0 |
NLogit/Limdep |
Commercial license price |
US$2,495 | Free | Free | $1,395 |
Multinomial Logit |
mlogit, 9.06 seconds (2.89 seconds with the “quietly” option") |
multinom, 50.59 sec + 52.29 sec zelig (mlogit), 77.89 sec VGLM (multinomial), 64.4 sec |
multinom, 32.7 sec + 49.8 sec zelig (mlogit), 69.92 sec VGLM (multinomial), 63.76 sec |
Logit; 36.72 sec |
Proportional odds model |
ologit, 1.69 sec 0.91 sec [quietly] oprobit, 0.91 sec [quietly] |
VGLM (parallel = T), 16.26 sec polr, 22.62 sec [o.logit] |
VGLM (parallel = T), 14.94 sec polr, 13.49 sec [o.logit] polr, 14.94 sec [o.probit] |
Ordered [Logit] 18.50 sec Ordered [Probit] 36.33 sec |
Generalized Logit |
gologit2, 18.67 sec (15.1 seconds with the “quietly” option") |
VGLM (parallel = F), 64.71 sec | VGLM (parallel = F), 64.86 sec |
Stata is even faster
When I reran the models using the quietly option (which supresses terminal output ) in Stata, I obtained the actual algorithm convergence times. For the multinomial logit model, Stata took fewer than 3 seconds to converge, making it 10-times faster than R. Similar reductions in execution times for Stata were observed for other algorithms reported in the table above.
64-bit version of R is faster, sometimes
The 64-bit version of R (2.13.0) reported faster execution times. The same was observed for the 64-bit version of R (2.12.0). Notice in the table above the dramatic reduction in the convergence times for the multinomial logit model (using multinom). R 2.13.0 [64-bit] took 35.4% less time to converge than R 2.12.0 [32-bit]. However, Zelig and VGLM based algorithms reported very modest improvements in execution times.
The ordered logit and ordered probit models (executed using the polr algorithm) also reported significant improvements in execution times.The ordered logit model took 40.3% less time in converging for R 2.13.0 [64-bit] than R 2.12.0 [32-bit].
I still do not understand why the summary(multinomial logit model) still takes an additional 49.8 seconds on top of 32.7 seconds to report summary results for the multinomial logit model. When I do not use summary() and instead use coef(multinomial logit model), I get instantaneous output.
In summary, it appears that not all algorithms would converge faster in the updated 64-bit version of R 2.13.0.
R is faster than Limdep/NLogit
In comparison, R [2.13.0] offered faster convergence times than NLogit for multinomial and ordered logit models and for ordered probit models. This puts R in the middle of two popular econometrics software. Stata is significantly faster than R, and R offers faster execution times than NLogit (see the difference for ordered logit in the table above).
What R Pros are saying about my post
If you were to scroll down to the comments section of my last post [Speeding tickets for R and Stata], you’ll notice some advice from experienced users of R. I have been advised to re-run the tests by first obtaining the optimised version of BLAS and LAPACK libraries. I am not sure how much difference would that make. However, it would be a little difficult for ordinary users of R (such as myself) to be able to determine what BLAS and LAPACK libraries to choose and install that are appropriate for their computer systems.
If significant speed gains could be achieved by using optimised BLAS and LAPACK libraries, the R installation routines may then be improved so that these libraries are made available to the novice end users of R.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...