Blog Archives

Resampling data in Hadoop with RHadoop

February 27, 2013
By

On Revolution Analytics partner Cloudera's blog, Uri Laserson has posted an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to implementing resampling methods using RHadoop. He provides the complete map-reduce code in the R...

Read more »

New ways to Hadoop with R

February 26, 2013
By

Today, there are two main ways to use Hadoop with R and big data: 1. Use the open-source rmr package to write map-reduce tasks in R (running within the Hadoop cluster - great for data distillation!) 2. Import data from Hadoop to a server running Revolution R Enterprise, via Hbase, ODBC (for high-performance Hadoop/SQL interfaces), or streaming data direct...

Read more »

What is Revolution R Enterprise?

February 25, 2013
By

Let us explain, in 90 seconds: Want a more in-depth introduction to R and Revolution R Enterprise? I'll be giving the webinar Revolution R Enterprise: 100% R and More on March 14. Just follow the link below to secure your seat for the live presentation, and to receive notification of the replay. Revolution Analytics webinars: Revolution R Enterprise: 100%...

Read more »

Revolution Newsletter: February 2013

February 25, 2013
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full February edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Case study: Real-Time Marketing Analytics. Online advertising technology company Exelate uses predictive models to...

Read more »

Free e-book on Data Science with R

February 22, 2013
By
Free e-book on Data Science with R

A new book by Jeffrey Stanton from Syracuse Iniversity School of Information Studies, An Introduction to Data Science, is now available for free download. The book, developed for Syracuse's Certificate for Data Science, is available under a Creative Commons License as a PDF (20Mb) or as an interactive eBook from iTunes. The book begins with the following clear definition...

Read more »

Video: IBM Opinionated Infrastructure Hangout

February 22, 2013
By

Had a great time earlier this week on a Google Hangout as part of the IBM Opinionated Infrastructure series. Moderator James Governor (analyst from RedMonk) kept the conversation lively, with topics ranging from to the value of information to the benefits of predictive analytics and evolution of Hadoop. R gets a mention at several points in the conversation, which...

Read more »

R in the news: Interviews with Revolution Analytics execs

February 22, 2013
By

Here are three recent news articles that feature interviews with members of the Revolution Analytics team talking about the importance of the R language: In Forbes, CEO Dave Rich talks to Gil Press about the business landscape for Big Data. In the article, Dave says: SAS and SPSS remind me of Cobol and Fortran circa 1995. The scientific and...

Read more »

Quandl: A Wikipedia for Time Series Data

February 20, 2013
By

This guest post is by Tammer Kamel, Founder of Quandl Finding and formatting numerical data for analysis in R or Excel or indeed any application is a pain that all real world data analysts know all too well. In aggregate I have probably spent weeks of my life trying to find data on the web. And several more weeks...

Read more »

Visualize major league pitching data with PitchRx

February 19, 2013
By

Anyone interested in playing around with the data generated by the PITCHf/x cameras at major league baseball games should definitely check out the pitchRx package from Carson Sievert. Major League Baseball Advanced Media makes the data available for download, and this package provides an interface from R to the speed, position and pitcher data for just about every MLB...

Read more »

10 R packages every data scientist should know about

February 18, 2013
By

The yhat blog lists 10 R packages they wish they'd known about earlier. Drew Conway calls them "10 reasons to always start your analysis in R". They're all very useful R packages that every data scientist should be aware of. They are: sqldf (for selecting from data frames using SQL) forecast (for easy forecasting of time series) plyr (data...

Read more »