Space Time Swing Probability Plot for Ichiro

May 30, 2012

(This article was first published on Statistically Significant, and kindly contributed to R-bloggers)

I was having some fun with PITCHf/x data and generalize additive models. PITCHf/x keeps track of the trajectory, path, location of every pitch in the MLB. It is pretty accurate and opens up baseball to more analyses than ever before. Generalized additive models (GAMs) are statistical models that put minimal assumptions on the type of model you are fitting. Traditional statistical models are linear, in that they assume that the response variable you are modelling is a linear function of the explanatory variables. GAMs just assumes that the relationship is “smooth.” Here is a good example of a relationship that may have traditionally been modeled as linear, but it is a much better assumption that the relationship is smooth.

I fit a GAM to PITCHf/x data. The response is whether or not Ichiro swung. The explanatory variables are pitch location on the x, pitch location on the z, and the day of the year. Obviously, we expect the probability of swinging to change as the pitch is closer or further away from the center of the strike zone. Additionally, I was interested in seeing his swinging propensity changed as the year went on.

You can see that the probability of swinging is smooth in both location and time. Also, you can see (ever so slightly) that the probability of swinging increased as the year went on. Looking at the splits, you can see that his walk percentage was 28/395 (7.1%) in the first half and 17/337 (5.0%) in the second half. This is in agreement with the swing probability increasing,

I used the mgcv package in R to run the GAM. I created an image for every day and stitched them together into a movie with ffmpeg. The R code is here.

To leave a comment for the author, please follow the link and comment on their blog: Statistically Significant. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)