Probabilistic forecasting for the FIFA Women’s World Cup 2023

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Winning probabilities for all teams in the FIFA Women’s World Cup are obtained using a consensus model based on quoted bookmakers’ odds. The favorite is defending World Champion United States, followed by European Champion England, and Spain.

Football fans around the world anticipate the FIFA Women’s World Cup 2023 that will take place in Australia and New Zealand from 20 July to 20 August 2023. 32 of the best World teams compete to determine the new World Champion. Here, a predictive model is established to forecast what the most likely outcome of the tournament will be. The forecast is based on the expert knowledge of 24 bookmakers and betting exchanges using a model averaging approach.
FIFA Women's World Cup 2023 logo

Winning probabilities

The model is the so-called bookmaker consensus model which has been proposed by Leitner, Hornik, and Zeileis (2010, International Journal of Forecasting, doi:10.1016/j.ijforecast.2009.10.001) and successfully applied in previous football tournaments, either by itself or in combination with even more refined machine learning techniques.

As in the FIFA Women’s World Cup 2019, the forecast shows that the United States are the clear favorite with a forecasted winning probability of 21.5%, followed by England with a winning probability of 15.7% and Spain with 13.1%. Three other teams are still a bit ahead of the rest: Germany with 9.7%, France with 7.5%, and co-host Australia with 7.4%. More details are displayed in the following barchart.

Interactive full-width graphic

Barchart: Winning probabilities

These probabilistic forecasts have been obtained by model-based averaging of the quoted winning odds for all teams across bookmakers. More precisely, the odds are first adjusted for the bookmakers’ profit margins (“overrounds”, on average 8.6%), averaged on the log-odds scale to a consensus rating, and then transformed back to winning probabilities. The raw bookmakers’ odds as well as the forecasts for all teams are also available in machine-readable form in wwc2023.csv.

Although forecasting the winning probabilities for the FIFA Women’s World Cup 2023 is probably of most interest, the bookmaker consensus forecasts can also be employed to infer team-specific abilities using an “inverse” tournament simulation:

  1. If team abilities are available, pairwise winning probabilities can be derived for each possible match (see below).
  2. Given pairwise winning probabilities, the whole tournament can be easily simulated to see which team proceeds to which stage in the tournament and which team finally wins.
  3. Such a tournament simulation can then be run sufficiently often (here 100,000 times) to obtain relative frequencies for each team winning the tournament.

Using this idea, abilities in step 1 can be chosen such that the simulated winning probabilities in step 3 closely match those from the bookmaker consensus shown above.

Pairwise comparisons

A classical approach to obtain winning probabilities in pairwise comparisons (i.e., matches between teams/players) is the Bradley-Terry model, which is similar to the Elo rating, popular in sports. The Bradley-Terry approach models the probability that a Team A beats a Team B by their associated abilities (or strengths):

Pr(A beats B)=abilityAabilityA+abilityB.

Coupled with the “inverse” simulation of the tournament, as described in step 1-3 above, this yields pairwise probabilities for each possible match. The following heatmap shows the probabilistic forecasts for each match with light gray signalling approximately equal chances and green vs. purple signalling advantages for Team A or B, respectively.

Interactive full-width graphic

Heatmap: Match probabilities

Performance throughout the tournament

As every single match can be simulated with the pairwise probabilities above, it is also straightfoward to simulate the entire tournament (here: 100,000 times) providing “survival” probabilities for each team across the different stages.

Interactive full-width graphic

Line plot: Survival probabilities

For example, this shows that the probability for the United States to reach any stage of the tournament is higher than for any other team to reach the same stage. In fact, their survival probabilities are decreasing rather slowly because they can most likely avoid the other favorites for the title until the semifinal. Conversely, Germany’s chances to reach the round of 16 are almost as high (87.6%) as those of the United States but their chances to reach the quarterfinal are much lower (55.7%) because they are most likely to play the strongest expected runner-up, Brazil, in the round of 16.

In addition to the curves shown in the plot above, further probabilities of interest can be obtained from the simulation. For example, the probability for the “dream final” between the top favorites, World Champion United States and European Champion England, is 9.1%. The most likely first semi-final is between the United States and Spain with a probability of 13.5%. For the second semi-final it is less clear who is the most likely opponent of England because there are three possible pairings with almost the same probability (around 7%): Against Australia, France, or Germany. This shows that this half of the tournament tree is somewhat more contested with a less certain outcome.

Odds and ends

The bookmaker consensus model has performed well in previous tournaments, often predicting winners or finalists correctly. However, all forecasts are probabilistic, clearly below 100%, and thus by no means certain. It would also be possible to post-process the bookmaker consensus along with data from historic matches, player ratings, and other information about the teams using machine learning techniques. However, due to lack of time for more refined forecasts at the end of a busy academic year, at least the bookmaker consensus is provided as a solid basic forecast.

As a final remark: Betting on the outcome based on the results presented here is not recommended. Not only because the winning probabilities are clearly far below 100% but, more importantly, because the bookmakers have a profit margin of 8.6% which assures that the best chances of making money based on sports betting lie with them.

Enjoy the FIFA Women’s World Cup 2023!

To leave a comment for the author, please follow the link and comment on their blog: Achim Zeileis. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)