How slow is R really?

[This article was first published on jacobsimmering.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One thing you always hear about R is how slow it is, especially when the code is not well vectorized or includes loops. But R is an interpreted language and its strong suit really isn’t speed but rather the comparative advantage is the 4,284 packages on CRAN. We accept the slower speed for the time saved from not having to re-invent the wheel every time we want to do something new.

But that doesn’t mean that it isn’t worth sometimes wondering how slow R is relative to other languages, especially with new tools like pandas in Python. I happened to be working on a Project Euler problem with the objective of calculating the first 10,001 prime numbers. I decided to see how R performed relative to my other primary languages of Python and C. I also wanted to see how R’s performance changed when I used apply() and also the new(ish) compiler package.

I took the same basic approach to each language by writing a two functions. The first determines whether a number is prime or a composite by trial division with the set {2, 3, 5, …, round(sqrt(number))} and stopped when a trial division had mod 0 or when we had exhausted all possible divisors. The second function considered the odd numbers and counted the number of prime values. It returned the value of the supplied index. The code for C, Python and R (with and without use of sapply()).

The results were most as expected:

time ./euler7
real    0m0.026s
user    0m0.024s
sys     0m0.000s

time python euler7.py 
real    0m0.409s
user    0m0.396s
sys     0m0.004s

time R CMD BATCH euler7.R
real    0m7.058s
user    0m6.268s
sys     0m0.028s

C, the only compiled language, was really fast. It was nearly 16 times faster than Python and over 270 times faster than R. Relative to R, Python was a 17-fold performance increase. To paraphrase the SAT, C is to Python as Python is to R (for this problem).

What about using sapply() and taking advantage of Rs functional programming? That was dreadful. Relative to the loops, using functional programing and sapply() actually increased runtime to 10.470 seconds.

R isn’t looking so hot here. The CRAN packages are still worth it but the relative performance advantages of Python and increasing analytical support but it is still largely confined to programmers who do stats. There is some hope with the byte code compiler for R. We get a massive performance increase in this case when we compile the functions before using them. Using cmpfun() reduced runtime to 2.408 seconds from the previous 7.058 and 10.470 seconds, respectively. While still much slower than Python or C, this represents a significant performance increase for R relative to its state just a year ago. Maybe we won’t have to depend on the incredible packages on CRAN for our comparative advantage forever.

To leave a comment for the author, please follow the link and comment on their blog: jacobsimmering.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)