# Probabilistic Forecasting for the 2018 FIFA World Cup

**Achim Zeileis**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Using a consensus model based on quoted bookmakers’ odds winning probabilities for all competing teams in the FIFA World Cup are obtained: The favorite is Brazil, closely followed by the defending World Champion Germany.

## Winning probabilities

The model is the so-called bookmaker consensus model which has been proposed by Leitner, Hornik, and Zeileis (2010, *International Journal of Forecasting*, https://doi.org/10.1016/j.ijforecast.2009.10.001) and successfully applied in previous football tournaments, e.g., correctly predicting the winner of the 2010 FIFA World Cup and three out of four semifinalists at the 2014 FIFA World Cup. This time the forecast shows that Brazil is the favorite with a forecasted winning probability of 16.6%, closely followed by the defending World Champion and 2017 FIFA Confederations Cup winner Germany with a winning probability of 15.8%. Two other teams also have double-digit winning probabilities: Spain and France with 12.5% and 12.1%, respectively. More details are displayed in the following barchart.

These probabilistic forecasts have been obtained by model-based averaging the quoted winning odds for all teams across bookmakers. More precisely, the odds are first adjusted for the bookmakers’ profit margins (“overrounds”, on average 15.2%), averaged on the log-odds scale to a consensus rating, and then transformed back to winning probabilities.

A more detailed description of the model as well as its results for the 2018 FIFA World Cup are available in a new working paper. The raw bookmakers’ odds as well as the forecasts for all teams are also available in machine-readable form in fifa2018.csv.

Although forecasting the winning probabilities for the 2018 FIFA World Cup is probably of most interest, the bookmaker consensus forecasts can also be employed to infer team-specific abilities using an “inverse” tournament simulation:

- If team abilities are available, pairwise winning probabilities can be derived for each possible match (see below).
- Given pairwise winning probabilities, the whole tournament can be easily simulated to see which team proceeds to which stage in the tournament and which team finally wins.
- Such a tournament simulation can then be run sufficiently often (here 1,000,000 times) to obtain relative frequencies for each team winning the tournament.

Using this idea, abilities in step 1 can be chosen such that the simulated winning probabilities in step 3 closely match those from the bookmaker consensus shown above.

## Pairwise comparisons

A classical approach to obtain winning probabilities in pairwise comparisons (i.e., matches between teams/players) is the Bradley-Terry model, which is similar to the Elo rating, popular in sports. The Bradley-Terry approach models the probability that a Team A beats a Team B by their associated abilities (or strengths):

$\mathrm{Pr}(A\text{beats}B)=\frac{{\mathrm{ability}}_{A}}{{\mathrm{ability}}_{A}+{\mathrm{ability}}_{B}}.$Coupled with the “inverse” simulation of the tournament, as described in step 1-3 above, this yields pairwise probabilities for each possible match. The following heatmap shows the probabilistic forecasts for each match with light gray signalling approximately equal chances and green vs. pink signalling advantages for Team A or B, respectively.

**leave a comment**for the author, please follow the link and comment on their blog:

**Achim Zeileis**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.