Coefplot: New Package for Plotting Model Coefficients

January 3, 2012
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

By Joseph Rickert

Even to the practiced eye, looking at coefficients in R model summaries can be tedious. And, capturing information about the significance of coefficients from scores or maybe even hundreds of models in a way that makes writing the final report a bit easier is a time consuming and thankless task. Of course, once you know what you are looking for, it only takes a few lines of code to select coefficients and plot them. Nevertheless, it would be nice to have a function that just plots the coefficients with error bars. Coefplot, a relatively recent package by Jared Lander, does exactly this and has the potential to become a very useful tool. Built on top of ggplot2 graphics, coefplot plots coefficients from lm and glm models as well as from the big data models generated by RevoScaleR's rxLinMod and rxLogit functions. A small example from Revolution Analytics’ Saar Golde illustrates the use of coefplot. The R code reads in credit data (see table) from 10 separate csv files, concatenates them into a single file,  

creditScore

houseAge

yearsEmploy

ccDebt

year

default

691

16

9

6725

2000

0

691

4

4

5077

2000

0

743

18

3

3080

2000

0

728

22

1

4345

2000

0

745

17

3

2969

2000

0

539

15

3

4588

2000

0

 

and uses RevoScaleR’s rxLinMod function to perform the linear regression:

default ~F(year) + yearsEmploy + ccDebt + creditScore

Note that the F function makes year a factor on the fly so that the regression will produce a coefficient for each year. Running coefplot on the model object produces the graph of the coefficients.

CoefPlot

This is slick, but to be really useful coefplot should be able to handle models with thousands of coefficients. I spoke with Jared about this. He said that he is well aware of the problem and is working on it:

“The big issue is identifying levels that belong to factors, which I solved, even for interactions. But how do people specify levels that might belong to different factors, or how to handle a specified level and its interactions, etc….”

It is difficult to build useful tools, and an amazing feature about the open source R project is that so many people are willing to try. Jared also said that he is open to suggestions.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series













Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)