**Revolutions**, and kindly contributed to R-bloggers)

R''s glm function for generalized linear modeling is very powerful and flexible: it supports all of the standard model types (binomial/logistic, Gamma, Poisson, etc.) and in fact you can fit any distribution in the exponential family (with the family argument). But if you want to use it on a data set with millions or rows, and especially with more than a couple of dozen variables (or even just a few categorical variables with many levels), this is a big computational task that quickly grows in time as the data gets larger, or even exhaust the available memory.

The rxGlm function included in the RevoScaleR package in Revolution R Enterprise 6 has the same capabilities as R's glm, but is designed to work with big data, and to speed up the computation using the power of multiple processors and nodes in a distributed grid. In the analysis of census data in the video below, fitting a Tweedie model on 5M observations and 265 variables takes around 25 seconds on a laptop. A similar analysis, using 14 million observations on a 5-node Windows HPC Server cluster takes just 20 seconds.

This demonstration was part of last week's webinar on Revolution R Enterprise 6. If you're not familiar with Revolution R Enterprise, the first 10 minutes is an overview of the differences from open-source R, and the remaining 20 minutes describes the new features in version 6. Follow the link below to check out the replay.

Revolution Analytics webinars: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

**leave a comment**for the author, please follow the link and comment on his blog:

**Revolutions**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...