Getting old in baseball

[This article was first published on Decision Science News » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



With baseball’s World Series drawing to a close, we thought we’d get in one last 2014 post on the US national pastime.

Keeping up with our aging theme, we’ll look at what happens to players’ batting averages as they age. We use the Lahman package in R, which has data from 1871 to 2013. We take the set of players who played in the majors for at least two years and look at the mean batting average at every age.

The green line (above, with smoothed plots, below with raw results with standard error bars) shows this basic result. Pro baseball players have their highest averages just over age 30. The area of the circles is proportional to the number of observations in that point.

When you look at results like those in the green line, however, you must stop to consider that the players who show up in the graph only tell part of the story. At a given age, there were other players who are not plotted because they were cut from the team years before (often due to their poor batting performance).

To illustrate this, at each age, I plot in the blue line the batting average of players who are in their last year of major league play. As one would expect, batting averages are low the year before players disappear from the major leagues. In the red line, we see the performance at each age of players who are not in their last year. For this subset of the data, peak batting average occurs at age 36 and the maximum is a bit flatter.

What is up with the increase in the blue line? The increasing trend is present even if you exclude the first two unusually low points. We are no experts on baseball (or sports of any kind) and are open to suggestions.

One thing to keep in mind is that people whose last year was at age 20 probably only played 2 years (I only considered players who played at least 2 years), while people whose last year was age 40 probably played about 20 years.


As usual, those who want to reproduce this in R are welcome to do so.

The post Getting old in baseball appeared first on Decision Science News.

To leave a comment for the author, please follow the link and comment on their blog: Decision Science News » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)