Why has R, despite quirks, been so successful?

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I was on a panel back in 2009 where Bow Cowgill said, “The best thing about R is that it was written by statisticians. The worst thing about R is that it was written by statisticians.” R is undeniably quirky — especially to computer scientists — and yet it has attracted a huge following for a domain-specific language, with more than two million users wordwide. 

So why has R become so successful, despite being outside the mainstream of programming languages? John Cook adeptly tackles that question in a 2013 lecture, “The R Language: The Good The Bad And The Ugly” (embedded below). His insight is that to understand a domain-specific language, you have to understand the domain, and statistical data analysis is a very different domain  than systems programming. 

I think R sometimes gets a bit of an unfair rap from its quirks, but in fact these design decisions — made in the interest of making R extensible rather than fast — have enabled some truly important innovations in statistical computing:

  • The fact that R has lazy evaluation allowed for the development of the formula syntax, so useful for statistical modeling of all kinds.
  • The fact that R supports missing values as a core data value allowed R to handle real-world, messy data sources without resorting to dangerous hacks (like using zeroes to represent missing data).
  • R's package system — a simple method of encapsulating user-contributed functions for R — enabled the CRAN system to flourish. The pass-by-value system and naming notation for function arguments also made it easy for R programmers to create R functions that could easily be used by others.
  • R's graphics system was designed to be extensible, which allowed the ggplot2 system to be built on top of the “grid” framework (and influencing the look of statistical graphics everywhere).
  • R is dynamically typed and allows functions to “reach outside” of scope, and everything is an object — including expressions in the R language itself. These language-level programming features allowed for the development of the reactive programming framework underlying Shiny
  • The fact that every action in R is a function — including operators — allowed for the development of new syntax models, like the %>% pipe operator in magrittr.
  • R gives programmers the ability to control the REPL loop, which allowed for the development of IDEs like ESS and RStudio.
  • The “for” loops can be slow in R which … well, I can't really think of an upside for that one, except that it encouraged the development of high-performance extension frameworks like Rcpp.

Some languages have some of these features, but I don't know of any language that has all of these features — probably with good reason. But there's no doubt that without these qualities, R would not have been able to advance the state of the art in statistical computing in so many ways, and attract such a loyal following in the process.

 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)