By guest blogger Joseph Rickert.
I was very happy to be a part of the ACM Data Mining camp held last Saturday (November 13th) at eBay. It was a big day for discussing hot topics in data mining, Mahout, parallel SVMs etc, and also a pretty big day for R. Because Revolution Analytics was a sponsor for the camp, I got to give a three minute company pitch and was very pleased to have people applaud my “I ‘heart’ R” slide.
(A cynic might point out that “I heart R” was my last slide and people were just relieved that I was finished speaking – three minutes of listening to me being a significant time investment. However, since the I ♥ R stickers went flying off the table where sponsors were allowed to leave their collateral material; I do think that at least a few data miners are developing some affection for R.) In addition to my brief presentation, I led a session on manipulating large data sets in R in that was attended by maybe 100 people. I was expecting to run some code real -time for a group of about twenty or so, but found myself instead up at the podium in the large conference room. Some adjustment was necessary for the larger audience, but I did show Revolution’s RevoScaleR package running cubes (crosstabs) and regressions and plotting histograms etc. while working directly with the 123 million row airlines data set that was used in the 2009 ASA completion. I was very happy to show off Revolution R, and I was doubly pleased to find out later that, concurrent with my session, Tricia Hoffman and Mike Bowles were leading a session on R programming. Nothing could be more encouraging for an R enthusiast than the apparent necessity to have competing R sessions. Please have a look at the examples Tricia and Mike presented which are posted on the conference website.
I also attended the session where Anup Parikh presented Red-R and lead a discussion on user interfaces. Red-R has some very nice features. At the very least, I can see R developers using it to document the flow of their analyses. Finally, I spoke briefly with Stanford’s Susan Holmes who mentioned that basic proficiency in R was a requirement for the data mining course she is teaching. As part of an answer to a question that came up during the expert panel discussion, Susan compared learning data mining skills to learning to play the piano. I am paraphrasing here, but Susan made the point that you could learn to play music by studying music theory first, or with a little help, you could sit at the piano and try to pick out a tune. Continuing this theme, I think that playing with R might be a natural way to tune your data mining skills.