Advanced Regression Analysis: How To Print All Best Models ?

[This article was first published on R Programming Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, I am about to explain you simple way to find as many best possible regression models you want, from any given predictors dataset.

I am going to show you a method, along with code, where you can print the summary statistics of all best models exported to a separate text file, get essential regression statistics and print the fitted values using lm() and rlm() – robust regression along with deviations and plots.

You also have the option to choose your best models based on number of variables in each model and multiple selections parameters such as adj-Rsq and Mallows-Cp. All in one piece of code.

We have had our shots with regression analysis. Though there is nothing as exciting as the moments when you lay your hands over that freshly prepared data, it could get frustrating when you need to get it delivered regularly in a time sensitive
manner. The codes I show in this post should help alleviate the issues caused by the routineness of the regression modelling process. In other words its for Stats analysts with routine deadlines.

Best subsets regression with leaps

Best subsets regression with leaps

I have seen people coming from other platforms where they typically use a software-inbuilt procedure to run a forecast or regression model or just use mouse clicks in a GUI interface to make their models. Doing these in a procedural manner
causes routineness and boredom subsequently when you have to get the results out repeatedly.

It would be a grave mistake if R programmers take the same route and repeat the mistakes committed by GUI analysts and Procedural statisticians. There is just too much amateurish R code out there that they underestimate the potential of R as
a programming language – often making comparisons with other statistical softwares. This view later becomes a benchmark for the newcomers to the language, who tend to learn it in parts and end up having an incomplete idea of the potential
of this language, a fate JavaScript had suffered for a while now. Who are we after all if we don’t use the excellent algorithmic capabilities that R generously offers. So remember, R is not just a statistical software, its a good programming language
too.

Now, coming back to the discussion. Lets load the ‘leaps’, ‘car’ and ‘MASS’ packages. The steps I am writing below should not be considered as a holy grail mechanism, but rather, you should have done the prior variable reduction part before you feed in the selected variables to the procedure below.

This script will generate the following outputs in the working directory:

Continued in next page..


To leave a comment for the author, please follow the link and comment on their blog: R Programming Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)