Analyzing birth rates from census data from RevoScaleR

November 18, 2011

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

In yesterday's webinar, "New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis", Sue Ranney demonstrated the features of the RevoScaleR big data analysis package included with Revolution R Enterprise. In the webinar, she showed how to use the rxImport function to import big data sets from SAS, SPSS or ODBC, how to use the rxDataStep function to pre-process the data using R functions, and how to scale the analysis from a desktop to an entire cluster without changing code, simply by setting a new compute context.

Sue also presented a novel data analysis, based on the US birth-data files mentioned in Joe Adler's R in a Nutshell:

The natality files are gigantic: they're approximately 3.1 Gb uncompressed [and that's for just one year of data — ed]. That's a little larger than R can easily process …

(You can download the data files from the CDC.) Sue showed how to read all 22 years of data (about 70Gb of raw data) into R with RevoScaleR (yielding a 16Gb XDF file, after selecting the relevant columns in the data step), and then fit a linear model to look a the difference in male/female birth rates for difference declared races:


I had no idea that there was a significant difference between boy/girl birth rates at the population level, let alone between the various sub-populations. I guess that's why I'm not a demographer, but sure I found it interesting. (If you'd like to see a more in-depth presentation about this analysis, check out Sue's presentation at useR! 2011.)

Sue's slides from the webinar presentation are below, and you can also download a full replay of the presentation at from the Revolution Analytics webinar archives.

Revolution Analytics Webinars: New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)