Analyzing birth rates from census data from RevoScaleR

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In yesterday's webinar, “New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis“, Sue Ranney demonstrated the features of the RevoScaleR big data analysis package included with Revolution R Enterprise. In the webinar, she showed how to use the rxImport function to import big data sets from SAS, SPSS or ODBC, how to use the rxDataStep function to pre-process the data using R functions, and how to scale the analysis from a desktop to an entire cluster without changing code, simply by setting a new compute context.

Sue also presented a novel data analysis, based on the US birth-data files mentioned in Joe Adler's R in a Nutshell:

The natality files are gigantic: they're approximately 3.1 Gb uncompressed [and that's for just one year of data — ed]. That's a little larger than R can easily process …

(You can download the data files from the CDC.) Sue showed how to read all 22 years of data (about 70Gb of raw data) into R with RevoScaleR (yielding a 16Gb XDF file, after selecting the relevant columns in the data step), and then fit a linear model to look a the difference in male/female birth rates for difference declared races:

RRE5_Birth_Plot

I had no idea that there was a significant difference between boy/girl birth rates at the population level, let alone between the various sub-populations. I guess that's why I'm not a demographer, but sure I found it interesting. (If you'd like to see a more in-depth presentation about this analysis, check out Sue's presentation at useR! 2011.)

Sue's slides from the webinar presentation are below, and you can also download a full replay of the presentation at from the Revolution Analytics webinar archives.

Revolution Analytics Webinars: New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)