Fresh off learning Bayesian techniques in one of my classes last quarter, I thought it would be fun to try to apply the method. I was able to find some examples of Hierarchical Bayes being used to analyze baseball data at Wharton.
Setting up the problem
On base percentage (OBP) is probably the most important basic offensive statistic in baseball. Getting a reliable estimate of a players true ability to get on base is therefore important. The basic problem is that the sample size we get from one season rarely has enough observations so that we are certain of a player's ability. Even though there are 162 games in a season, there is a possibility that the actual OBP is the result of luck rather than skill. Bayesian analysis will "regress" the actual observed OBP to the mean, in that if a player has a small number of plate appearances (PA) it doesn't give them very much weight and the result will be something closer to the overall (MLB) average. On the other hand, if a player has quite a few PAs then it believes that the results are not the result of luck and it gives the observations a lot of weight.
We are trying to estimate the "true" OBP of each batter. Bayesian analysis assumes that the true OBP is random. Empirical Bayes is a method of figuring out the distribution of "true" OBP using the data. OBP is times on base divided by PA. Times on base (X) for each batter is distributed binomial with n=PA and p=true OBP. We further assume that p is distributed Beta with parameters a and b. It follows from this that the marginal distribution of X is distributed according to the distribution:
gamma(a+b)*gamma(a+x)*gamma(n-x+b)*(n choose x)/(gamma(a)*gamma(b)*gamma(a+b+n))
where gamma is the gamma function.
We will estimate the parameters a and b based on the data (X), using its marginal distribution (the "empirical" part of Bayes). To do this I found that likelihood of the marginal distribution of all the batters. Then I maximized this likelihood by adjusting the parameters a and b. This is called the ML-II.
I used data for all non-pitchers in 2010. I assume that each player is independent. In doing that, I just have to multiply all the marginals for each player together to get the likelihood. When I do this and maximize it with respect to a and b, I get estimates that a = 83.48291 and b = 174.9038. I think this can be interpreted that prior mean (what we would assume that average OBP of a batter is before seeing him bat) is a/(a+b) = 0.323. This is pretty close to what the overall OBP of the league was (0.330). I think it makes sense that the prior is lower than the league average because batters who do well will get more opportunities and players that do poorly will get fewer. So the league average is biased high.
Below is a graph of the prior distribution and the updated posteriors of every batter. You can (sort of) see that the posteriors have tighter distributions than the prior does. (The posterior distribution of each batter in this case is the distribution of OBP after we have observed PA and the actual OBP.)
One way to see why this Bayesian analysis is useful is to compare the posterior means with the observed OBP. If someone has only a few PAs, their OBP could be very high or very low and this may mislead you into thinking that this batter is very good or bad. However, the posterior mean takes into account the number of PAs. Below is a graph comparing the two. You can see that the range of values for posterior mean is pretty small, especially compare to actual OBP.
Here is a list of the highest posterior mean OBP:
|Batter||Posterior Mean||Actual OBP|
And here is a list of the lowest posterior mean OBP:
|Batter||Posterior Mean||Actual OBP|
You can see that all of the posterior means are pulled closer to the overall mean (the good players look worse and the bad players look better). The order changes a little bit but not too much.
You can see the effect of sample size (PAs) by comparing Justin Morneau with Joey Votto. Morneau had a higher OBP, but Votto ended up with a higher posterior mean because he had more PAs (Votto had 648 while Morneau had 348). Here are their posterior distributions:
Because of the additional PAs, you can see that the distribution of Votto is a little tighter than Morneau. We are more sure that Votto is excellent than we are sure that Morneau is excellent.