Updated forecasts for the UEFA Euro 2020 knockout stage

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After all group stage matches at the UEFA Euro 2020 we have updated the knockout stage forecasts by re-training our hybrid random forest model on the extended data. This shows that England profits most from the realized tournament draw.


After the 36 matches of the group stage were completed earlier this week, we had decided to update our probabilistic forecast for the UEFA Euro 2020. As the evaluation of the group stage showed that, by and large, the forecasts worked reasonably well up to this point, we kept our general strategy and just made a few updates:

  • The historic match abilities for all teams were updated to incorporate the results from the 36 additional matches from the group stage. Given that the estimates are weighted such that the most recent results have a higher influence, this changed the estimates of the team abilities somewhat.
  • The average plus-minus player ratings for all teams were also updated but these changed to a lesser degree given that each team only played three additional matches.
  • All other covariates (bookmaker consensus, market value, etc.) were left unchanged.
  • The learning data set for the hybrid random forest that combines all the predictors was extended: In addition to all the matches from the UEFA Euro 2004-2016 it now includes the group stage results from this year’s Euro.
  • The resulting predicted number of goals for each team can then be used to simulate the entire knockout stage 100,000 times.

While all the changes above have a certain influence, the biggest effect arguably comes from the last item: Because the match-ups for the round of 16 are fixed now, there is a lot less variation in the potential courses of the tournament. Specifically, it is now clear that there are more top favorites in the upper half of the tournament tableau (namely France, Spain, Italy, Belgium, Portugal) than in the lower half of the tableau (England, Germany, Netherlands). In the following it is shown in more detail what the consequences of this are.

Winning probabilities

The updated results show that now England became the top favorite for the title with a winning probability of 17.4% because they are more likely to face weaker opponents provided they beat Germany in the round of 16. Our top favorite from the pre-tournament forecast was France and they rank now second with an almost unchanged winning probability of about 15.0%. The winning probabilities for all teams are shown in the barchart below with more information linked in the interactive full-width version.

Interactive full-width graphic

Barchart: Winning probabilities

Somewhat surprisingly, Italy still has a rather low winning probability of only 7.3% whereas they are now among the top three teams according to most bookmaker odds. This is most likely due to the tournament draw: If they beat Austria in the round of 16, they meet either the FIFA top-ranked team Belgium or defending champion Portugal in the quarter final. In a potential semi-final they would have a high chance of facing either France or Spain.

Match probabilities

Using the hybrid random forest an expected number of goals is obtained for both teams in each possible match. Using these, we can compute the probability that a certain match ends in a win, a draw, or a loss in normal time. The same can be repeated in overtime, if necessary, and a coin flip is used to decide penalties, if needed.

The resulting probability that one team beats the other in a knockout match is depicted in the heatmap below. The color scheme uses green vs. brown to signal probabilities above vs. below 50%, respectively. The tooltips for each match in the interactive version of the graphic also print the probabilities for the match results after normal time.

Interactive full-width graphic

Heatmap: Match probabilities

Performance throughout the tournament

As every single match can be simulated with the pairwise probabilities above, we are able to simulate the entire knockout stage 100,000 times to provide “survival” probabilities for each team across the remaining stages. Teams in the upper half of the tournament tableau are shown in orange while the lower half teams are shown in blue.

Interactive full-width graphic

Line plot: Survival probabilities

This shows that England has relatively low chances of surviving the round of 16 – at least compared to other top teams like France, Italy, or Netherlands who play against weaker opponents. However, provided England proceeds to the quarter final, they have a really high probability of prevailing up to the final match.

In summary, the updates compared to the pre-tournament forecast changed but maybe not as much as expected. The most important change in information is that the remaining course of the tournament is rather clear now while the knowledge from the 36 group stage matches themselves has only moderate effects. Thus, the most exciting part of the UEFA Euro 2020 is only starting now and we can all be curious what is going to happen. Everything is still possible! (Recall that in the 2016 tournament Portugal eventually took the championship despite not winning a single group stage match and ranking third in their group.)

To leave a comment for the author, please follow the link and comment on their blog: Achim Zeileis.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)