Winners at the World Cup

[This article was first published on World Soccer Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The 2018 FIFA World Cup has been over for a month now but the memories are far from fading. It was a fabulous tournament, combining underdog stories, upsets, and some pretty high quality football.

The World Cup has been around for 88 years now, featuring 79 national teams and over 900 matches played. Kaggle’s International Football Results dataset provides a list of all international matches, including every World Cup contest. I decided to count the number of wins from each team and visualize it on an animated bar graph. To make it more readable, I only kept countries that had a cumulative win count of 20+. That left us with nine nations and the following plot:


And here is the R code:

Pre-Processing: Load packages, Load data, Filter to only WC matches, Create “winner” variable, Create “year” variable


df <- read_csv("results.csv")

world_cup <- df %>% filter(tournament == "FIFA World Cup")

world_cup <- world_cup %>% 
mutate(winner = ifelse(world_cup$home_score > world_cup$away_score, 
ifelse(world_cup$away_score > world_cup$home_score,

world_cup$year <- world_cup$date %>% year()


Filter to only teams with greater or equal to 20 wins, Complete rows so that every winner has a row for each year, and Create cumulative-wins variable

top_teams <- world_cup %>% filter(winner %in% c( "Brazil",
                                           "Uruguay" ))
#complete rows
top_teams <- top_teams %>% 
group_by(year, winner) %>%
count() %>%
ungroup() %>%
complete(year,winner, fill = list(n=0))

#create cumulative sum variable, grouped by winner
top_teams <- top_teams %>%
group_by(winner) %>%


Create gif from GGPLOT

i <- 1930
for (i in c(1930,1934,1938,1950,1954,1958,1962,1966,1970,1974,1978,
            1982,1986,1990,1994,1998,2002,2006,2010,2014,2018)) {

year_games <- as.character(i)

year_data <- top_teams %>% filter(year == i)

gg <- year_data %>% ggplot(aes(x = winner,
y = cs,
frame = year,
group = winner,
fill = winner)) +
"Netherlands" ,
"Uruguay" )) +
ylim(0,75) +
geom_bar(stat = "identity")+
ggtitle(paste0("Number of Victories at the FIFA World Cup (1930 - ",
                year_games,")")) +
scale_colour_brewer(palette = "Set1") +
labs(x = "", y = "Cumulative Wins")+
guides(fill=FALSE) +
theme(panel.grid.major = element_blank(), 
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold", size = 18),
axis.text.x = element_text(face = "bold", size = 14),
axis.title.y = element_text(face = "bold", size = 14),
axis.text.y = element_text(face = "bold", size = 14)) +
scale_fill_manual(values = c("Brazil" = "yellow1",
"Germany" = "gray8",
"Italy" = "#007FFF",
"Argentina" = "lightblue",
"France" = "darkblue",
"Spain"= "darkred",
"England" = "white",
"Netherlands" = "darkorange",
"Uruguay" = "dodgerblue2")) 


}, = 'world_cup_histogram.gif', interval = 0.8, 
   ani.width = 1500, ani.height = 900)


Code can be found on Github. A big thanks to David Smith at Revolution Analytics, whose blog post helped me a lot in creating the visual.



To leave a comment for the author, please follow the link and comment on their blog: World Soccer Analytics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)