**The Prince of Slides**, and kindly contributed to R-bloggers)

Here is a quick post responding to a request by Bob Carpenter at one of my favorite nerd blogs: Statistical Modeling, Causal Inference and Social Science. While a lot of the Bayesian theory is out of my league, Dr. Gelman really makes you think about some applied statistical problems in social science.

Anyway, the request was for a quick scatter plot (I’m not going to go nuts and pull out Bugs code for some Bayesian Hierarchical Model or anything like that here!) of batter performance and ability to foul balls off in given counts (I could also do base-out states, but I’ll keep it simple for now).

Luckily, I had R up and running with my Pitch F/X database already in. Of course, a full analysis would require understanding where the pitches are thrown that are being fouled off (along with velocity and pitch type), but then it gets a bit complicated. Anyway, here we go. I’ll start with a quick table of averages for percentage of pitches fouled off in each count (please excuse the awful table formatting here).

0-0 | 0-1 | 0-2 | ||

10.37% | 17.61% | 19.20% | ||

1-0 | 1-1 | 1-2 | ||

15.55% | 20.46% | 22.44% | ||

2-0 | 2-1 | 2-2 | ||

15.39% | 23.30% | 26.00% | ||

3-0 | 3-1 | 3-2 | ||

2.41% | 21.48% | 29.91% |

If anything, there’s a slight downward trend here (as found before at Baseball Analysts, linked at the previous link). And finally, foul percentage plotted against wOBA for each count. Here, I removed outliers (well, outliers defined as 2 standard deviations above the average foul rate), as they should make up most of the players who did not get nearly enough at bats for the foul rates to matter. This didn’t work perfectly and there are some obvious anomolies likely due to low plate-appearances, but I think we get a decent look at things. Also, the lower censoring (at 0) makes it more difficult to pick up a pattern in the plots. In addition, the plot includes player-seasons, not just players. So someone like Pujols will be in here 4 times (2007 through 2010):

It might be instructive to look at these same plots only for pitches swung at (so players aren’t penalized for being selective at the plate) and/or only on pitches near the edges of the strike zone (so we’re just looking at pitches that the players are fighting off). The analysis here doesn’t show too much going on, but that doesn’t mean there’s nothing there.

Below, I’ve done the latter, with the same plots from above. I define the edge as 8 inches from the center of the plate and/or below 1.8 feet or above 3.3 feet vertically. Of course, you can define the edge in a number of ways. This is rough, quick code and I didn’t have time to get into too much detail today:

Keep in mind this is only for Pitch F/X data. That means some of 2007, and all of the 2008 through 2010 regular seasons. I try to wait until the end of the season to update my database each year. I imagine this would be more interesting with even more years of data (like from Retrosheet, as mentioned in the linked blog post). I think Dan Turkenkopf is going to try this out, as he says in the comments. Perhaps I’ll extend this later on to the swinging only as well.

Finally, one other thing to look at is whether pitchers really do get frustrated after a long string of foul balls and get burned throwing a pitch down the middle. There is probably a skill somewhere between fouling pitches off and flat out missing those pitches just because a better batter likely make contact more often. But in terms of purposefully trying to foul a pitch off–at least from my own experience playing baseball–I have doubts that guys go up there looking to ‘spoil’ pitches. To foul a pitch off, you have to make sure it doesn’t hit the bat directly, otherwise it would go into play. Hard to believe that in and of itself would be a repeatable skill. To just edge the bat to the ball, you’ve got a good chance of missing it, too.

This is by no means a deep analysis, and I didn’t do any sort of fantastic job at cleaning it up beforehand. Just some fun crosstabs and scatter plots.

Any thoughts from those of you reading this????

**leave a comment**for the author, please follow the link and comment on their blog:

**The Prince of Slides**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...