What’s New in 6.2: Stepwise Regression for Big Data

March 26, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Thomas Dinsmore

This is the third in a series of posts highlighting new features in Revolution R Enterprise Release 6.2, which is scheduled for General Availability April 22.  This week's post features our new Stepwise Regression capability.

The Stepwise process starts with a specified model and then sequentially adds into or removes from the model the variable that improves the fit most based on a selection criterion until no further improvement is possible or it hits a specified model boundary.  By automating the process of selecting feature candidates for use in a predictive model, Stepwise Regression significantly accelerates the model building process.

One of our customers, for example, builds more than a thousand models every week for targeted marketing.  At that scale of activity, traditional model-fitting techniques are simply too slow.  Starting with a feature set of more than 500 candidate variables, this customer runs fast feature selection techniques to reduce the number of variables, then runs Stepwise Regression to finalize the model.

In designing the Stepwise Regression capability, we relied on customer feedback, and also reviewed similar capabilties in open source R, such as stepAIC() in the MASS package.  Since many of our customers are actively converting from SAS, we looked at the Stepwise capabilities in SAS as well. 

In Release 6.2, we support the following Stepwise methods for Linear Regression: 

  • Forward selection
  • Backwards elimination
  • Bidirectional search

 We support three different user-specifiable selection criteria:

  • AIC
  • BIC
  • Mallows' Cp

Coming up later this year in our Release 7.0, we plan to expand the Stepwise capabilities to Logistic Regression and General Linear Models.

Your comments and suggestions are welcome. If you like to use Stepwise Regression and you are interested in a feature that you don't see mentioned in this post, let us know what you think in the Comments section below.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.