November 15, 2011

(This article was first published on James Keirstead » R, and kindly contributed to R-bloggers)

I’ve been working through Gelman et al.’s otherwise excellent Bayesian Data Analysis and it’s going reasonably well. My statistics is a little bit rusty so it’s taken time to work through all of the exercises and really understand what’s going on. But I say “otherwise excellent” because yesterday I spent ages trying to figure out a problem, only to discover that the data published in the book don’t correspond to the text discussion.

The troublemaker is the SAT problem in section 5.5. The authors give values for two variables y_j and \sigma_j for j=1\ldots8 schools, rounded to integer values. Using the formulas given in the text, I then calculated estimates for the complete pooling statistics and came up with a posterior treatment effect, y, of 7.7 and a variance of 16.6. However the text says these values should be 7.9 and 17.4 respectively.

I puzzled over this for quite a while, thinking that maybe I’d missed a prior/posterior distinction somewhere and my estimates were supposed to be subtlely shifted. But no, when I went and checked the original data source, I found that the values were reported to 4 sig figs. Repeating the calculation with the new values gives the expected results. Grr…

Here’s the data and R-code if anyone’s interested.

### Load SAT data from the Gelman et al book and the original Rubin paper
df <- data.frame(school=LETTERS[1:8],
### Rearrange data into a handy form
df <- melt(df,id="school")
vals <- colsplit(df$variable,"\\.",c("source","metric"))
df <- cbind(df,vals)
df <- df[,-2]
df <- cast(df,school+source ~ metric)
### Calculate the summary statistics

Created by Pretty R at

To leave a comment for the author, please follow the link and comment on their blog: James Keirstead » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)