The words “statistics” and “baseball” are often found near each other, but there's a lot more to statistics than dividing the number of hits by the number of swings to get a batting average. And there's a lot more to sabermetrics — the statistical analysis of baseball — than averages, too. Many baseball fans are also stats geeks (and vice versa) and have done deep statistical analysis of baseball data, oftentimes with R.
Dave Allen of the Baseball Analysts website regularly uses R to visualize PitchFX data, as does his stablemate Jeremy Greenhouse (for example, in this analysis of optimal swing rates). ESPN's the Sports Guy inspired a detailed analysis in R by Ryan Elmore. Ricky Zanker of the Hardball Times has published a guide on reading baseball data into R. Mike Driscoll created an interactive R application to visualize data from PitchFX. Even the widely-read New York Times election analyst Nate Silver got his statistical start with sabermetrics.
If you're a baseball fan and you'd like to do some sabermetrics of your own, but haven't made the leap to learn R yet, Millsy is here to help. He's a graduate student in Sport Management at the University of Michigan, and has created a series of tutorials to help you learn R by analyzing baseball data. The tutorials take the new R user and baseball fan step-by-step through your first R commands, reading in data, manipulating objects, and even creating charts and doing simple analysis. Good practical advice is scattered throughout, such as advice “to use color to help portray the information you are trying to communicate, rather than just to make things bright”, with this awesome example of what not to do:
There are five parts to the series so far, and I'm looking forward to seeing more in this great series.
The Prince of Slides: sab-R-metrics