Stepwise Regression – What’s not to like ?

[This article was first published on rstats – DataDrumstick, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

 

Plenty, apparently.

Besides encouraging you not to think , it doesn’t exactly do a great job at what it claims to do. Given a set of predictors, there is no guarantee that stepwise regression will find the optimal combination. Many of my statisticians buddies , whom I consult from time to time,  have a  gripe with it because it’s not  sensitive to the context of the research. Seems fair.

 

I built an interactive Shiny app to evaluate results from Stepwise regression (direction = “backward) when applied to different predictors and datasets. What I observed during model building and cross-validation was that the model performed better on the data at hand but performs much worse when subjected to cross-validation.After a lot of different random selections and testing, I eventually did find a model that worked well on both the fitted dataset and the cross-validation set, but it performed poorly when applied to new data.Therein lies at least most of the problem.

 

Shiny app can be found here.

The initial model was built to predict the ‘Life Expectancy’ and it does, to a certain extent, do it’s job . But when generalized, it pretty much turned out to be a bit of an uncertainty ridden damp squib. For example, predictions for the variables from the same dataset , such as, ‘Population’ ,’Frost’, ‘Area’ are nowhere close to the observed values. At the same time, the model did okay for variables such as ‘Illiteracy’ and ‘HS.Grad’.

Given all these drawbacks ( and more! ), people do find the motivation to use stepwise regression to produce a simpler model in terms of number of coefficients. It does not necessarily find the optimal model, but it does give a hunch of the possible combination of predictors.

While no one would conclude a statistical study based on stepwise results or publish a paper with it, some might find uses for it, say, to  verify models already created by software systems. Or as an easy-to-use tool for initial exploratory data analysis (with all the necessary caveats in place !) .

You win some, you lose some.

What do you think ? Leave a comment!

p.s. You can find the (needs-to-be-cleaned-up) code for the Shiny app here.

 

 

 


To leave a comment for the author, please follow the link and comment on their blog: rstats – DataDrumstick.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)