Trends in AL run scoring (using R)
I have started to explore the functionality of R, the statistical and graphics programming language
. And with what better data to play than that of Major League Baseball?
There have already been some good examples of using R to analyze baseball data. The most comprehensive is the on-going series at The Prince of Slides
(Brian Mills, aka Millsy), cross-posted at the R-bloggers
site. I am nowhere near that level, but explaining what I've done is a valuable exercise for me -- as Joseph Joubert
said (no doubt in French) "To teach is to learn twice over."
So after some reading (I have found Paul Teetor's R Cookbook
particularly helpful) and working through some examples I found on the web, I decided to plot some time series data, calculate a trend line, and then plot the points and trend line. I started with the American League data, from its origins in 1901 through to the All Star break of 2012. For this, I relied on this handy table at Baseball Reference.
Step 1: load the data into the R workspace. This required a bit of finessing in software outside R. Any text editor such as Notepad or TextPad would do the trick. What I did was paste it into the text editor, tidied up the things listed below, and then saved the file with a .csv extension.Read more »
To leave a comment
for the author, please follow the link and comment on his blog: Bayes Ball
offers daily e-mail updates
news and tutorials
on topics such as: visualization (ggplot2
), programming (RStudio
, Web Scraping
) statistics (regression
, time series
) and more...
If you got this far, why not subscribe for updates
from the site? Choose your flavor: e-mail
, or facebook