by Joseph Rickert
Data Week 2013 is being held this week in sunny San Francisco at the Fort Mason conference center overlooking the Bay. Holding a Bay Area R User Group Meeting (BARUG) at Data Week helped to raise the R consciousness among the hip conference crowd attracted by the intoxicating mix of blue skies, big data hype, startups and visionaries. The BARUG members, on the other hand, came mostly for the free beer and lightning talks. There were six, 12 minute talks with themes that ranged from basic R applications to using R to replace SAS in a big-league manufacturing process.
Timothy Sweetser began the evening by showing the regression model he used to analyze BART fares. This was an elementary, but clever analysis of an everyday kind of question, the sort that briefly floats through your mind while you are buying a ticket: “How come this trip costs this much, but I paid a different amount last week for what seemed like a similar trip”. The plot below shows the strata in fares by distance as well as Timothy’s regression model.
Utham Kamath described Mathpak, a new cloud based, platform for building collaborative analytical applications, marketing and monetizing them, and showed how R based applications would fit nicely into this scheme. It seemed to me that Utham and his fellow developers are envisioning an new “pick up game” kind of collaboration where developers from around the world will undertake serious projects that anyone of them alone would not have the resources to even contemplate.
Clark Fitzgerald spoke about the favorable economics of running R in Amazon cloud (EC2) virtual machines. He compared serious computational hardware to tractors from the point of view that most people just rent tractors when they need to do the heavy lifting. He went on to make the case that the economics of cloud based computing are favorable for even relatively small projects involving teaching and automation. You don’t necessarily have to be working on some high performance computing project to see the benefits.
Elaine Jones showed how her IBM tape storage manufacturing group achieved some serious cost cutting by replacing an expensive ($150K) SAS group license with R to do a number of ETL tasks that are fundamental to the production workflow. Critical tasks such as extracting raw data from DB2, summarizing it, formatting it and loading it into a different DB2 databases that used to take 30 or so SAS programs are now handled by R scripts. The following graph shows the production workflow and where R replaced SAS.
For someone who blogs about R, it was really encouraging to hear that Elaine first heard about R in from reading the 2009 NY TImes article about R published in an internal INM webpage.
Mathias Brandewinder talked about the new F# to R type provider, a kind of “bridge mechanism” for sharing data and resources between the two languages. Types enable R to be expressed as an F# resource. Now, F# users can call R from within the F# environment, and R developers can make use of F# in production code. Mathias gave very convincing live demo where working from his F# IDE he seemed to be mixing F# and R code on the fly to achieve an impressive level of integration. It was like watching a musician switch between instruments.
Harrison Decker finished up the evening by describing how reproducible research tools in R are evolving to meet the needs of scientists and researchers. Reproducible research:
- Allows authors to reproduce the results and figures in their research publications
- Aids verification of results by other researchers
- Allows researchers to learn from and build on the work of others
- Builds community
Harrison very eloquently articulated one of the major strengths of R when he said, almost in passing: “R grows because people are building and sharing”. The slides from all of the presenters will be posted on the BARUG meetup website.
Other R related activities include a well-attended R Bootcamp that was held on Tuesday, “The R Summit” a series of talks by Tess Nesbit of Data Song, Uday Tennety of Revolution Analytics, Ryan Walker of Blue Shield of California and Ryan White of A9, and a panel discussion “R means: Business”, led by David Smith. The talks and panel discussion are taking place today.