Trends in AL run scoring (using R)

July 14, 2012

(This article was first published on Bayes Ball, and kindly contributed to R-bloggers)

I have started to explore the functionality of R, the statistical and graphics programming language. And with what better data to play than that of Major League Baseball?

There have already been some good examples of using R to analyze baseball data. The most comprehensive is the on-going series at The Prince of Slides (Brian Mills, aka Millsy), cross-posted at the R-bloggers site. I am nowhere near that level, but explaining what I've done is a valuable exercise for me -- as Joseph Joubert said (no doubt in French) "To teach is to learn twice over." 

So after some reading (I have found Paul Teetor's R Cookbook particularly helpful) and working through some examples I found on the web, I decided to plot some time series data, calculate a trend line, and then plot the points and trend line. I started with the American League data, from its origins in 1901 through to the All Star break of 2012.  For this, I relied on this handy table at Baseball Reference.

Step 1: load the data into the R workspace.  This required a bit of finessing in software outside R. Any text editor such as Notepad or TextPad would do the trick.  What I did was paste it into the text editor, tidied up the things listed below, and then saved the file with a .csv extension.

Read more »

To leave a comment for the author, please follow the link and comment on his blog: Bayes Ball. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.