R, I Love You
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It is easier to critique than it is to create. I write this post with much gratitude for R, the R community and particularly R-Core who are paid $0 to bring us R. I’d like to offer an idea and I’m wondering if people are interested in rallying around it.
Julia, I’m in committed relationship
You might have caught the post titled “Julia, I Love You”. It’s the top article on Rbloggers. Perhaps you had the same reaction I did. I read the material, repeated “wow” a few times, and slipped into a contemplative space. Am I betting on an outdated technology? I slapped myself (figuratively) and snapped back to reality. I vaguely remember when Revolution Analytics released side-by-side performance figures there was pushback about an apples-to-oranges comparison. Some tests had to be reworked (or am I just make that up?). People have added comments to the Julia post with performance fixes to the R code used to benchmark against Julia. And in the end, languages come and go but R has withstood the test of time.
I use R | Julia because. . .
Why do people use R? In my (informal, anecdotal, not rigorous, no medals of honor conferred) survey, the reasons people use R are:
- It’s free
- There are lots of packages on CRAN
- Its easy to code
Food for thought:: Julia has #1 and #3 covered and #2 is just a matter of time if the adoption curve is upward sloping. All things being equal…
Performance IS an issue
Something is bugging me. What’s bugging me is how defeated I feel when I see R benchmarked against anything. I see the figures and I think, “it is what it is.” I’m sure the R-defenders will point to all sorts of stats and tell me that I’m wrong. I’ll still shrug. I use R on a daily basis and it consistently feels like an order of magnitude slower than my internal benchmark (lots of hand-waving there). Yeah, I know I can use C++. Its not my cup of tea. If it were I would just use C++ and I wouldn’t try to shim it into R. I’m using R because I want to write in R. I do byte-compile all packages, which helps a little.
Also, R is lacking on multicore. On Windows its not possible to run tasks in parallel, in-process (if you have gotten this to work, please let me know). Out-of-process communication is slow and there’s the memory tax of maintaining multiple instances of R. Most of the time all cores but one are idle. Tech moves fast. If we don’t challenge ourselves then we decay.
R, I Love You, so here’s my idea
Stop writing new features. R has enough features. Lets make 2.16.0 a performance fix release. This is not an uncommon practice. Instead of treating performance as something we tackle incrementally, lets steer our open source mindshare towards the single goal of performance for a release or two. Lets excite developers who get their jollies from optimization.
Heck, lets pay for it if we have to. There’s an R Foundation, right? Kickstarter has been enormously successful in raising funds for projects that people care about. If a team on Kickstarter can raise $1M for a video game that has a limited shelf life, surely we can raise capital for a tool that some of us practically live in. People will donate because they stand to benefit.
We can organize around 3 broad categories:
- High performance, multi-core math libraries
- JIT
- Lightweight, in-process task parallization on all major platforms
So that’s my idea. Thoughts?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.