Interesting Posts at Rational Past Time Related to My Previous Strike Zone Map Post

December 8, 2010

(This article was first published on The Prince of Slides, and kindly contributed to R-bloggers)

J-Doug at Rational Pastime has some cool posts looking at umpire strike zones at his site (and cross-posted at Beyond the Boxscore). I was curious about this issue as well with some work I’ve been doing here in the office (which I’ll refrain from talking about at this point).

Anyway, J-Doug looks at the strike zone size of RHB and LHB, concluding that lefties get the shaft (larger strike zone). Now, I only have Bruce Froemming’s data in R right now, but I was curious if we would see anything different using A. Just Froemming for now and B. Using the GAM package, rather than a standard loess. Below is the ‘gam’ generated heat maps from my last post for LHB and RHB (LHB got deleted somehow when I posted, and I’m getting extremely frustrated with Blogger’s posting options).

And here is a standard loess with some new smoothing parameters than I had in my last post:

Now a few caveats: first, I have not normalized the strike zone height, the box is simply the average zone for everyone in the dataset. So the fact that we see calls spread a little further above the strike zone for righties than for lefties may just mean lefties are overall shorter (or it could just be some random noise). Secondly, this is only one umpire, while J-Doug has more than that. Lastly, I’m still in experimental mode with the gam models, so I could be totally off here.

Now to the interesting parts. Looking at the GAM model heat maps (the ones using the binomial assumption for the response) seem to show that the zone for right-handed batters is a little bigger than that for left-handed batters. In fact, this seems to be the case for both the standard loess and the gam package.

The main difference seems to be that the zone stretches further outside for lefties than it does for right handers. Right handers have to deal with more calls up and in and down and in than lefties apparently do (for Froemming that is).

I dunno. Just some observations. I haven’t calculated confidence intervals or systematically chosen the span, but I made sure that for each of the pairs, the parameters for smoothing were the same to make them comparable. For the ‘gam’ model maps, I have a span of 0.5 and a first degree polynomial, while for the ‘loess’ model maps, I have a 0.7 span and a second degree polynomial. But the main issue is comparing RHB and LHB of each type.

So what does this mean? Well not too much. It could mean that Froemming doesn’t follow the standard. It could mean that maybe using the ‘gam’ package is helpful in visualizing the true zone. Or it could mean that I didn’t use the right parameters for my model(s). It certainly does not mean that J-Doug’s conclusions are incorrect, but I’m curious how the results may look otherwise.

My Own Evidence Against Me:

Here is some evidence that the above plots (both the ‘loess’ and the ‘gam’) are incorrect from a visualization standpoint: I’ve also run a regression that indicates umpires as a whole are more likely to call strikes against left-handers, even after controlling for pitcher handedness, pitch location, pitch type, and a number of other factors. Another regression with respect to whether the call is ‘correct’ or not also tells me that umpires are more likely to make an incorrect call for left-handed batters at a rate of about 1.8%.

So in general, it sounds like J-Doug is right: left-handed batters are getting the shaft.

Finally, in general, if the above plots are visualizing the data correctly, Bruce Froemming goes against the grain when it comes to giving an advantage to Right Handed batters (I didn’t run a separate regression for him).

To leave a comment for the author, please follow the link and comment on their blog: The Prince of Slides. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)