Speeding tickets for R and Stata

April 10, 2011
By

(This article was first published on eKonometrics, and kindly contributed to R-bloggers)

How fast is R? Is it as fast in executing routines as the other off-the-shelf software, such as Stata? After some comparative experimentation, I found Stata to be 5 to 8  times faster than R.

For me, speed has not been a concern in the past. I had used R with smaller datasets of roughly 5000 to 10,000 observations and found it to be as fast as other statistical software. More recently, I have been working with still a relatively small-sized data set of 63,122 observations. After realizing that R was very slow in executing the built-in routines for multinomial and ordinal logit models, I ran similar models in Stata with the same data set and found Stata to be much faster than R.

Before I go any further, I must confess that I did not try to determine ways to improve speed in R by, for instance, choosing  faster converging algorithms. I hope readers would send me comments on how to speed-up execution for the routines I tested in R.

My data set comprised an ordinal dependant variable [5 categories] and categorical explanatory variables with 63,122 observations. I used a computer running Windows 7 on Intel Core 2 Quad CPU Q9300 @ 2.5 GHz with 8 GB of RAM. Further details about the test are listed in the following Table.

Software Routine

Stata 11 (duo core)

R (2.12.0)

Multinomial Logit mlogit, 9.06 seconds multinom, 50.59 seconds
zelig (mlogit), 77.89 sec
VGLM (multinomial), 64.4 sec
Proportional odds model ologit, 1.69 sec VGLM (parallel = T), 16.26 sec
polr, 22.62 seconds
Generalized Logit gologit2, 18.67 sec VGLM (parallel = F), 64.71 sec

I first estimated the standard multinomial logit model in R using the multinom routine. R took almost 51 seconds to return the results. The subsequent call to summarise the model took another 52.29 seconds, thus making the total execution time in R to be 103 seconds. Surprised at the slow speed, I tried other options in R to estimate the same model. I first tested mlogit option in Zelig. The execution time was even slower at 78 seconds. I followed up with VGAM package, which returned a slightly better result with 64.4 seconds.

Other examples listed above suggest similar slower times for R in comparison with Stata.

What could be the reason for such an order of magnitude difference in speed between R and Stata. I unfortunately don’t have the answer. I do know that Revolution Analytics offers similar performance benchmark comparisons between their version of souped-up R (Revolution R) and the generic R. Revolution R was found to be five to eight times faster than regular R.

image

Other performance benchmarks revealed even greater speed differentials between Revolution R and the generic R.

image

There must be ways to make routines execute faster in R. A few weeks earlier, Professor John Fox ( a long-time contributor to R and the programmer of the R GUI, R Commander) delivered a guest lecture at the Ted Rogers School of Management in Toronto at the GTA R Users’ Group meeting. His talk focussed on how to program using binary logit model as an example. His code for binary logit was found to be much faster than the one that comes bundled with the GLM in R.

This makes me wonder: are there ways to make the generic R run faster?

To leave a comment for the author, please follow the link and comment on his blog: eKonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.