BATTING AND AGING
With baseball’s World Series drawing to a close, we thought we’d get in one last 2014 post on the US national pastime.
Keeping up with our aging theme, we’ll look at what happens to players’ batting averages as they age. We use the Lahman package in R, which has data from 1871 to 2013. We take the set of players who played in the majors for at least two years and look at the mean batting average at every age.
The green line (above, with smoothed plots, below with raw results with standard error bars) shows this basic result. Pro baseball players have their highest averages just over age 30. The area of the circles is proportional to the number of observations in that point.
When you look at results like those in the green line, however, you must stop to consider that the players who show up in the graph only tell part of the story. At a given age, there were other players who are not plotted because they were cut from the team years before (often due to their poor batting performance).
To illustrate this, at each age, I plot in the blue line the batting average of players who are in their last year of major league play. As one would expect, batting averages are low the year before players disappear from the major leagues. In the red line, we see the performance at each age of players who are not in their last year. For this subset of the data, peak batting average occurs at age 36 and the maximum is a bit flatter.
What is up with the increase in the blue line? The increasing trend is present even if you exclude the first two unusually low points. We are no experts on baseball (or sports of any kind) and are open to suggestions.
One thing to keep in mind is that people whose last year was at age 20 probably only played 2 years (I only considered players who played at least 2 years), while people whose last year was age 40 probably played about 20 years.
As usual, those who want to reproduce this in R are welcome to do so.