In a previous post, I used R to process data from the Lahman database to calculate index values that compare a team's run production to the league average for that year. For the purpose of that exercise, I started the sequence at 1947, but for what follows I re-ran the code with the time period 1901-2012.
The R code I used can be found at this Github gist. Instead of boring you here with the ins and outs of what the code is doing, I've embedded that as documentation in the gist. The R code assumes that you've got a data frame called “Teams.merge” already in your workspace. This can be achieved by running the previous code, or if you've done that before, you'll have created a csv file with the name “Teams.merge.csv”, and now have the option to read that file as a data frame “Teams.merge”.
The first step is to choose one of the current teams, and create a data frame that contains just that club's history. Once this has been done, the code then creates trend lines (using the LOESS method, as I did with the leagues in previous posts), and then plot them.
Read more »