Site icon R-bloggers

A Change in Age at the Olympics

[This article was first published on World Soccer Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Olympics is an interesting competition for soccer fans: it has the characteristics of both a big tournament like the World Cup and a smaller showcase for young players. The quality of play is often top class and the excitement can be very high, but it’s ultimately a chance for newcomers to show their talent  in their national team.

Kaggle’s Olympics History dataset contains an array of information on the sporting event, going back to Athens 1896. For the purpose of this analysis, I looked specifically at soccer players who participated at the Olympics and decided to analyze the change in players’ average age over the last 120 years.

 

The data shows an increase in age until the mid-20th century, a slight decrease over the next 40 years, and a more significant decrease starting in the 1990s. Further research shows that in 1992 a restriction of 23 years old was put in place. Teams from then on have been restricted to only three players over the age of 23, which explains the larger drop at the end of the 20th and early 21st century.

To better visualize the change over time, I have used a LOESS (Locally Weighted Scatter-plot Smoother) model, which allows for a smooth curve that combines both linear least-squares and nonlinear regression.

A few interesting insights:

And as I enjoy making animated visualizations, here is a more energetic version of the above graph:

As you can see, the LOESS curve changes shape based on the new data available.

library(dplyr)
library(ggplot2)
library(ggthemes)
library(lubridate)
library(readr)
library(animation)
soccer_olympics <- df %>% filter(Event == "Football Men's Football")



medal_winners <- soccer_olympics %>% filter(Medal %in% c("Gold","Silver", "Bronze"))
medal_winners$Medal <- medal_winners$Medal %>% factor(levels = c("Gold", "Silver", "Bronze"))

age_medal_winners <- medal_winners %>% 
  group_by(Team,Games,Year,Medal) %>%
  summarise(mean_age = mean(Age,na.rm = TRUE))

age_medal_winners %>%
  ggplot(aes(x = Year,
             y = mean_age,
             color = Medal)) +
  geom_point()+
  geom_smooth()
 


age_medal_winners %>%
  ggplot(aes(x = Year,
             y = mean_age,
             color = Medal)) +
  geom_smooth()

 

And for the animated graph:

saveGIF({
for (i in 1900:2016) {

year_games &amp;lt;- as.character(i)

year_data % filter(Year &amp;lt;= i)

gg %
ggplot(aes(x = Year,
y = mean_age
)) +
geom_point()+
geom_smooth(method = "auto", se = FALSE)+
theme_tufte()+
ggtitle(paste0('Mean Age of Soccer Teams at the Olympics (1900 - ', year_games,")"))+
theme(legend.position="none")+
theme(plot.title = element_text(size = 22,
face= "italic",
hjust = 0.5))+
scale_x_continuous(breaks=seq(1900,2020,10), limits = c(1900,2020))+
scale_y_continuous(breaks=seq(18,30,2), limits = c(18,30))+
labs(y = "Mean Age", caption ="Data from Kaggle.com")

print(gg)
}
}, movie.name = 'olympic_team_ages.gif', interval = 0.2, ani.width = 1000, ani.height = 700)

 

 

To leave a comment for the author, please follow the link and comment on their blog: World Soccer Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.