This is Part 3 of a five-part article series, with new parts published each Thursday. You can download the complete article from the Revolution Analytics website.
Power from Elegance
If the R movement has a genuine rock star, it’s probably Hadley Wickham. He’s an assistant professor and the Dobelman Family Junior Chair in Statistics at Rice University. He’s written and contributed to more than 20 R packages, and he’s won the John Chambers Award for Statistical Computing.
Most of Wickham’s research focuses on making data analysis better, faster and easier. He is especially interested in using visualization techniques to improve how data and models are understood. In other words, he’s all about making it easy to use R.
“R was designed from the ground up to deal with common data problems,” says Wickham. “Compared to other programming languages, it’s designed to help you do the kinds of things that you do most often when you’re performing data analysis. For example, R has data frames built into the core language. It’s such a natural structure, and it makes working with data much easier. But very few other languages have data frames built in.”
Because R was created by statisticians for statisticians, it’s already loaded with many of the crucial features required to accomplish the everyday tasks of statistical analysis. The very design of the R language is often described as “elegant” – in other words, R is in tune with the way statisticians think and work.
For example, says Wickham, “In statistics, it’s really critical to keep track of missing values. That’s when you don’t know what a value is, but you need some way of indicating it. R keeps track of that for you, so that if you add a number to a missing number, you still don’t know what that number is and R will keep track of it. That’s important.”
No Need to Reinvent the Wheel
Precisely because R is a programming language – as opposed to being a pre-fabricated piece of software – new analytic techniques that are written in R can be saved and re-used. So when R users discover something fresh and exciting, they have two options that are not generally available to users of pre-fab software:
- They can share the new techniques with other R users, inside their organizations and all over the world.
- They can reproduce and re-use the new techniques they have discovered.
These are not trivial or minor advantages – they represent enormous potential value. The ability to save and re-use improvised functions means that you’re not forced to reinvent the wheel each time that you run an analytic operation. Try doing that in SAS or SPSS and you’re in for a long haul.
The ability to share new R code through forums hosted by CRAN (Comprehensive R Archive Network) and other groups ensures a state of continuous evolution. Bluntly put, the world of R never sits still.
“New methods show up in R before they show up in other packages,” says Michael Elashoff of CardioDX, a molecular diagnostics company that collects data from multiple sources and builds predictive models in R that help physicians detect cardiovascular diseases in their patients.
“We do a lot of predictive model development on complex data sets, so the ability use and evaluate new statistical methods is important to us. Especially in the last couple of years, many of these newer methods have been showing up as R packages first. R is definitely on the cutting edge,” says Elashoff.
Zubin Dowlaty, VP / Head of Innovation & Development at Mu Sigma, has a similar take on the value of R. Headquartered in Chicago, Mu Sigma is a global analytics services company providing business decision support services to clients in data-intensive industries such as pharma, insurance, financial services, CPG/retail, healthcare and technology. All of that means that Mu Sigma is in the business of analyzing data – big time.
“The large ecosystem of statisticians all over the world adding new functions and packages to the R system is a huge benefit,” says Dowlaty. “State-of-the-art algorithms are available quickly through the R platform.”
The R platform has become so comprehensive that it now represents a “one-stop shop” for analytical techniques, says Dowlaty. “Most of the techniques you need to drive analytics into the business are available through R – everything from statistical to machine learning and optimization techniques. Unlike other vendors, like SAS or SPSS, R provides everything in one go-round.”