Big Data Generalized Linear Models with Revolution R Enterprise

June 28, 2012

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

R''s glm function for generalized linear modeling is very powerful and flexible: it supports all of the standard model types (binomial/logistic, Gamma, Poisson, etc.) and in fact you can fit any distribution in the exponential family (with the family argument). But if you want to use it on a data set with millions or rows, and especially with more than a couple of dozen variables (or even just a few categorical variables with many levels), this is a big computational task that quickly grows in time as the data gets larger, or even exhaust the available memory.


The rxGlm function included in the RevoScaleR package in Revolution R Enterprise 6 has the same capabilities as R's glm, but is designed to work with big data, and to speed up the computation using the power of multiple processors and nodes in a distributed grid. In the analysis of census data in the video below, fitting a Tweedie model on 5M observations and 265 variables takes around 25 seconds on a laptop. A similar analysis, using 14 million observations on a 5-node Windows HPC Server cluster takes just 20 seconds.


This demonstration was part of last week's webinar on Revolution R Enterprise 6. If you're not familiar with Revolution R Enterprise, the first 10 minutes is an overview of the differences from open-source R, and the remaining 20 minutes describes the new features in version 6. Follow the link below to check out the replay.

Revolution Analytics webinars: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)