For me, speed has not been a concern in the past. I had used R with smaller datasets of roughly 5000 to 10,000 observations and found it to be as fast as other statistical software. More recently, I have been working with still a relatively small-sized data set of 63,122 observations. After realizing that R was very slow in executing the built-in routines for multinomial and ordinal logit models, I ran similar models in Stata with the same data set and found Stata to be much faster than R.
Before I go any further, I must confess that I did not try to determine ways to improve speed in R by, for instance, choosing faster converging algorithms. I hope readers would send me comments on how to speed-up execution for the routines I tested in R.
My data set comprised an ordinal dependant variable [5 categories] and categorical explanatory variables with 63,122 observations. I used a computer running Windows 7 on Intel Core 2 Quad CPU Q9300 @ 2.5 GHz with 8 GB of RAM. Further details about the test are listed in the following Table.
Stata 11 (duo core)
|Multinomial Logit||mlogit, 9.06 seconds||multinom, 50.59 seconds
zelig (mlogit), 77.89 sec
VGLM (multinomial), 64.4 sec
|Proportional odds model||ologit, 1.69 sec||VGLM (parallel = T), 16.26 sec
polr, 22.62 seconds
|Generalized Logit||gologit2, 18.67 sec||VGLM (parallel = F), 64.71 sec|
I first estimated the standard multinomial logit model in R using the multinom routine. R took almost 51 seconds to return the results. The subsequent call to summarise the model took another 52.29 seconds, thus making the total execution time in R to be 103 seconds. Surprised at the slow speed, I tried other options in R to estimate the same model. I first tested mlogit option in Zelig. The execution time was even slower at 78 seconds. I followed up with VGAM package, which returned a slightly better result with 64.4 seconds.
Other examples listed above suggest similar slower times for R in comparison with Stata.
What could be the reason for such an order of magnitude difference in speed between R and Stata. I unfortunately don’t have the answer. I do know that Revolution Analytics offers similar performance benchmark comparisons between their version of souped-up R (Revolution R) and the generic R. Revolution R was found to be five to eight times faster than regular R.
Other performance benchmarks revealed even greater speed differentials between Revolution R and the generic R.
There must be ways to make routines execute faster in R. A few weeks earlier, Professor John Fox ( a long-time contributor to R and the programmer of the R GUI, R Commander) delivered a guest lecture at the Ted Rogers School of Management in Toronto at the GTA R Users’ Group meeting. His talk focussed on how to program using binary logit model as an example. His code for binary logit was found to be much faster than the one that comes bundled with the GLM in R.
This makes me wonder: are there ways to make the generic R run faster?