Coefplot: New Package for Plotting Model Coefficients

January 3, 2012
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

By Joseph Rickert

Even to the practiced eye, looking at coefficients in R model summaries can be tedious. And, capturing information about the significance of coefficients from scores or maybe even hundreds of models in a way that makes writing the final report a bit easier is a time consuming and thankless task. Of course, once you know what you are looking for, it only takes a few lines of code to select coefficients and plot them. Nevertheless, it would be nice to have a function that just plots the coefficients with error bars. Coefplot, a relatively recent package by Jared Lander, does exactly this and has the potential to become a very useful tool. Built on top of ggplot2 graphics, coefplot plots coefficients from lm and glm models as well as from the big data models generated by RevoScaleR's rxLinMod and rxLogit functions. A small example from Revolution Analytics’ Saar Golde illustrates the use of coefplot. The R code reads in credit data (see table) from 10 separate csv files, concatenates them into a single file,  

creditScore

houseAge

yearsEmploy

ccDebt

year

default

691

16

9

6725

2000

0

691

4

4

5077

2000

0

743

18

3

3080

2000

0

728

22

1

4345

2000

0

745

17

3

2969

2000

0

539

15

3

4588

2000

0

 

and uses RevoScaleR’s rxLinMod function to perform the linear regression:

default ~F(year) + yearsEmploy + ccDebt + creditScore

Note that the F function makes year a factor on the fly so that the regression will produce a coefficient for each year. Running coefplot on the model object produces the graph of the coefficients.

CoefPlot

This is slick, but to be really useful coefplot should be able to handle models with thousands of coefficients. I spoke with Jared about this. He said that he is well aware of the problem and is working on it:

“The big issue is identifying levels that belong to factors, which I solved, even for interactions. But how do people specify levels that might belong to different factors, or how to handle a specified level and its interactions, etc....”

It is difficult to build useful tools, and an amazing feature about the open source R project is that so many people are willing to try. Jared also said that he is open to suggestions.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.