By Joseph Rickert
Even to the practiced eye, looking at coefficients in R model summaries can be tedious. And, capturing information about the significance of coefficients from scores or maybe even hundreds of models in a way that makes writing the final report a bit easier is a time consuming and thankless task. Of course, once you know what you are looking for, it only takes a few lines of code to select coefficients and plot them. Nevertheless, it would be nice to have a function that just plots the coefficients with error bars. Coefplot, a relatively recent package by Jared Lander, does exactly this and has the potential to become a very useful tool. Built on top of ggplot2 graphics, coefplot plots coefficients from lm and glm models as well as from the big data models generated by RevoScaleR's rxLinMod and rxLogit functions. A small example from Revolution Analytics’ Saar Golde illustrates the use of coefplot. The R code reads in credit data (see table) from 10 separate csv files, concatenates them into a single file,
and uses RevoScaleR’s rxLinMod function to perform the linear regression:
default ~F(year) + yearsEmploy + ccDebt + creditScore
Note that the F function makes year a factor on the fly so that the regression will produce a coefficient for each year. Running coefplot on the model object produces the graph of the coefficients.
This is slick, but to be really useful coefplot should be able to handle models with thousands of coefficients. I spoke with Jared about this. He said that he is well aware of the problem and is working on it:
“The big issue is identifying levels that belong to factors, which I solved, even for interactions. But how do people specify levels that might belong to different factors, or how to handle a specified level and its interactions, etc....”
It is difficult to build useful tools, and an amazing feature about the open source R project is that so many people are willing to try. Jared also said that he is open to suggestions.