I have started to explore the functionality of R, the statistical and graphics programming language. And with what better data to play than that of Major League Baseball?
There have already been some good examples of using R to analyze baseball data. The most comprehensive is the on-going series at The Prince of Slides (Brian Mills, aka Millsy), cross-posted at the R-bloggers site. I am nowhere near that level, but explaining what I've done is a valuable exercise for me — as Joseph Joubert said (no doubt in French) "To teach is to learn twice over."
So after some reading (I have found Paul Teetor's R Cookbook particularly helpful) and working through some examples I found on the web, I decided to plot some time series data, calculate a trend line, and then plot the points and trend line. I started with the American League data, from its origins in 1901 through to the All Star break of 2012. For this, I relied on this handy table at Baseball Reference.
Step 1: load the data into the R workspace. This required a bit of finessing in software outside R. Any text editor such as Notepad or TextPad would do the trick. What I did was paste it into the text editor, tidied up the things listed below, and then saved the file with a .csv extension.