Innovation in Statistical Computing

[This article was first published on Jeffrey Horner, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In A Capitalist’s Dilemma, Whoever Wins on Tuesday, Clayton Christensen lays out three kinds of innovations through which an industry cycles:

  • Empowering Innovations – those that offer products and services to a new customer base. The classic empowering (or disruptive) innovation is Ford Motor Company’s introduction of the low-cost Model T coupled with the ability of Ford’s own workers to afford such a car.
  • Sustaining Innovations – those that improve on the value of current products and services by replacing them with newer and better ones. Christensen offers the hybrid Toyota Prius as an example.
  • and Efficiency Innovations – those that reduce the cost of making and distributing current products and services, such as steel minimills and low cost car insurance like Geico.

Today, I see this cycle coming full circle in the field of statistical computing, and specifically with R.

There is no question that John Chamber’s S system has been an empowering innovation. The S System was remarkable in that it pioneered the use of data visualization and interactive computing. Prior to S, statisticians wrote single programs to perform a single task, or they bundled these programs together into algorithmic collections or subprograms.

Without a doubt, the open source R project (not unlike S) can be viewed as a sustaining innovation. It improves on S in many ways, preserving and enhancing the interactive environment, the language, data visualization, etc. More importantly, it integrates the ability to easily download and use software located on CRAN (Comprehensive R Archive Network).

Finally, there are many efficiency innovations that have occured with R, mainly through new R packages. There are too many to list, but Paul Murrell’s grid package gave birth to lattice and ggplot2 improving data visualization, and Hadley Wickam’s devtools package made it easy to create and distribute packages.

But the biggest efficiency innovation to alter statistical computing in R has been the  creation of RStudio, an open source IDE for R. No other IDE, commercial or open source, can touch the feature set or even quality of RStudio’s products.

Two observations about RStudio have brought me to this conclusion:

  • their complete IDE can run in the browser, offering the possibility to harness supercomputing facilities and big data from a laptop, and easing systems administration of many R users by managing only one R install.
  • and the ability to quickly create packages and share them with others. This video shows the bare minimum steps needed to bundle your code and share it with millions, in under two minutes!

Truth be told, RStudio leverages all the good work made by others. For instance, it’s Wickam’s devtools package underneath the hood driving RStudio’s packaging feature. It’s Yihui’s knitr package along with Sweave that makes writing R documentation in RStudio such a pleasure. But it’s in the engineering, the stitching  together of all these packages that creates an innovative experience. And it’s too soon to tell, but we may look back on this period in history and say that RStudio was more than an efficiency innovation; it might just have been disruptive, too.

To leave a comment for the author, please follow the link and comment on their blog: Jeffrey Horner.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)