I recently got a hold of table standings for the English Premier League (EPL) for the past 10 years. In this post, I want to explore the question: How similar are the point distributions across seasons?
Code for the figures found in this post can be found here.
The first plot I made was a histogram for each Season:
The default setting of 30 bins for the histogram results seems too granular for this situation, with most bars having just 1 observation. It is also difficult to compare the shapes of the histograms across the panels.
The next plot seeks to remedy these issues by (i) plotting smoothed densities instead of histograms, and (ii) plotting the densities for all the seasons on the same figure:
Since the density plots seem to bunch up pretty well, it looks like the point distribution is fairly stable across years. The distribution looks almost bimodal, with a clear peak around 40 points, and a much smaller peak in the region of 70-75 points. These features become more obvious when we plot the smoothed density estimate for all the years combined:
Another nice way to visualize the data is using a joy plot. This can be done with the ggridges package:
The joy plot allows us to compare the peaks of the distributions easily across seasons, as well as to detect any temporal trend (if it exists).