Visually weighted regression in R (à la Solomon Hsiang)

August 30, 2012
By

(This article was first published on Nicebread » R, and kindly contributed to R-bloggers)

[Update 1: Sep 5, 2012: Explore the Magical Data Enhancer by IRES, using this visualization technique]

[Update 2: Sep 6, 2012: See new improved plots, and new R code!

Solomon Hsiang proposed an appealing method for visually displaying the uncertainty in regressions (see his blog [1][2], and also the discussions on the Statistical Modeling, Causal Inference, and Social Science Blog [1][2]).

I implemented the method in R (using ggplot2), and used an additional method of determining the shading (especially concerning Andrew Gelman’s comment that traditional statistical summaries (such as 95% intervals) give too much weight to the edges. In the following I will show how to produce plots like that:

I used following procedure:

  1. Compute smoothers from 1000 bootstrap samples of the original sample (this results in a spaghetti plot)
  2. Calculate a density estimate for each vertical cut through the bootstrapped smoothers. The area under the density curve always is 1, so the ink is constant for each y-slice.
  3. Shade the figure according to these density estimates.

 

Now let’s construct some plots!

The basic scatter plot:

 

 

 

No we show the bootstrapped smoothers (a “spaghetti plot”). Each spaghetti has a low alpha. That means that overlapping spaghettis produce a darker color and already give weight to highly populated regions.

Here is the shading according to the smoother’s density:

 

Now, we can overplot the median smoother estimate for each x value (the “median smoother”):

Or, a visually weighted smoother:

 

 

Finally, we can add the plain linear regression line (which obviously does not refelct the data points very well):

 

At the end of this post is the function that produces all of these plots. The function returns a ggplot object, so you can modify it afterwards, e.g.:

?View Code RSPLUS
vwReg(y~x, df, shade=FALSE, spag=TRUE) + xlab("Implicit power motive") + ylab("Corrugator activity during preparation")

 

Here are two plots with actual data I am working on:

The correlation of both variables is .22 (p = .003).

A) As a heat map (note: the vertical breaks at the left and right end occur due to single data points that get either sampled or not during the bootstrap):

 

 

B) As a spaghetti plot:

 

 

 

Finally, here’s the code (sometimes the code box is collapsed – click the arrow on the top right of the box to open it). Comments and additions are welcome.

?View Code RSPLUS
[Update: I removed the code, as an updated version has been published <a title="Visually weighted/ Watercolor Plots, new variants: Please vote!" href="http://www.nicebread.de/visually-weighted-watercolor-plots-new-variants-please-vote/">here</a> (see at the end of the post)]

To leave a comment for the author, please follow the link and comment on his blog: Nicebread » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.