How R will turn into SQL

July 16, 2013

(This article was first published on 4D Pie Charts » R, and kindly contributed to R-bloggers)

Up until very recently the only way of running R code was through the standard R distribution. Of course you could use another IDE, but somewhere underneath it all you would be running the same, standard R engine from the R-core team.

This is no longer your only option. A couple of weeks ago Radford Neal released pqR, a research project that reimplements pieces of the R engine to make it “pretty quick”. Much of pqR is likely to be folded back into the main R engine, but that isn’t the only new player in town.

Tibco, led by Michael Sannella, have rewritten the R engine from the ground up for TERR, a commercial R distribution. This leads to some very interesting possibilities. Maybe in the few years we could see many more R engines appearing. Oracle have recently invested heavily in R, and it’s certainly imaginable that they could create a high performance R that is tightly coupled to their database products. Google are also big R investors, and I could easily see them creating an R engine that has parallelism built in.

After that, perhaps even Microsoft might notice R and fix the fact that its integration with .NET is rubbish. IronR, anyone?

This situation isn’t as far fetched as you may think: there is a very obvious precedent in the data analysis world. There are dozens of different database vendors that use a common interface langauge: SQL. In the world of databases, the engine is separate from the programming language.

This is a good thing – multiple vendors provides competition, differentiation and innovation. You have a full spectrum of products from big corporate databases (Oracle again) down to miniature data stores like sqlite.

The fact that SQL has led the way means that some of the potential traps are visible before we embark down this road. My biggest bugbear with SQL is that it isn’t quite universal: different vendors have their own non-standard extensions, which means that not all SQL code is portable. A situation like HTML or C++ or Fortran, where a standards committee defines an official version of the language would be preferable. Whether R-core or another body would set this standard is a matter to be decided. (I suspect that R-core would not welcome the additional administrative burden, and commercial vendors may want more control in defining the spec, so a separate R-standards group is more likely.)

These are interesting times for R, and I look forward to seeing how the separation of language and engine progresses.

Tagged: future-tech, pqR, r, sql, TERR, tibco

To leave a comment for the author, please follow the link and comment on their blog: 4D Pie Charts » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)