**Achim Zeileis**, and kindly contributed to R-bloggers)

Using a consensus model based on quoted bookmakers’ odds winning probabilities for all competing teams in the FIFA World Cup are obtained: The favorite is Brazil, closely followed by the defending World Champion Germany.

## Winning probabilities

The model is the so-called bookmaker consensus model which has been proposed by Leitner, Hornik, and Zeileis (2010, *International Journal of Forecasting*, https://doi.org/10.1016/j.ijforecast.2009.10.001) and successfully applied in previous football tournaments, e.g., correctly predicting the winner of the 2010 FIFA World Cup and three out of four semifinalists at the 2014 FIFA World Cup. This time the forecast shows that Brazil is the favorite with a forecasted winning probability of 16.6%, closely followed by the defending World Champion and 2017 FIFA Confederations Cup winner Germany with a winning probability of 15.8%. Two other teams also have double-digit winning probabilities: Spain and France with 12.5% and 12.1%, respectively. More details are displayed in the following barchart.

These probabilistic forecasts have been obtained by model-based averaging the quoted winning odds for all teams across bookmakers. More precisely, the odds are first adjusted for the bookmakers’ profit margins (“overrounds”, on average 15.2%), averaged on the log-odds scale to a consensus rating, and then transformed back to winning probabilities.

A more detailed description of the model as well as its results for the 2018 FIFA World Cup are available in a new working paper. The raw bookmakers’ odds as well as the forecasts for all teams are also available in machine-readable form in fifa2018.csv.

Although forecasting the winning probabilities for the 2018 FIFA World Cup is probably of most interest, the bookmaker consensus forecasts can also be employed to infer team-specific abilities using an “inverse” tournament simulation:

- If team abilities are available, pairwise winning probabilities can be derived for each possible match (see below).
- Given pairwise winning probabilities, the whole tournament can be easily simulated to see which team proceeds to which stage in the tournament and which team finally wins.
- Such a tournament simulation can then be run sufficiently often (here 1,000,000 times) to obtain relative frequencies for each team winning the tournament.

Using this idea, abilities in step 1 can be chosen such that the simulated winning probabilities in step 3 closely match those from the bookmaker consensus shown above.

## Pairwise comparisons

A classical approach to obtain winning probabilities in pairwise comparisons (i.e., matches between teams/players) is the Bradley-Terry model, which is similar to the Elo rating, popular in sports. The Bradley-Terry approach models the probability that a Team A beats a Team B by their associated abilities (or strengths):

$\mathrm{Pr}(A\text{beats}B)=\frac{{\mathrm{ability}}_{A}}{{\mathrm{ability}}_{A}+{\mathrm{ability}}_{B}}.$Coupled with the “inverse” simulation of the tournament, as described in step 1-3 above, this yields pairwise probabilities for each possible match. The following heatmap shows the probabilistic forecasts for each match with light gray signalling approximately equal chances and green vs. pink signalling advantages for Team A or B, respectively.

## Performance throughout the tournament

As every single match can be simulated with the pairwise probabilities above, it is also straightfoward to simulate the entire tournament (here: 1,000,000 times) providing “survival” probabilities for each team across the different stages.

This also shows that indeed the most likely final is a match of the top favorites Brazil and Germany (with a probability of 5.5%) where Brazil has the chance to compensate the dramatic semifinal in Belo Horizonte, four years ago. However, given that it comes to this final, the chances are almost even (50.6% for Brazil vs. 49.4% for Germany). For the semifinals it is most likely (with a probability of 9.4%) that Brazil and France meet in the first semifinal (with chances slightly in favor of Brazil in such a match, 53.5%) while Germany and Spain most likely (with 9.2%) play the second semifinal (with chances slightly in favor of Germany with 53.1%).

## Odds and ends

The bookmaker consensus model has performed well in previous tournaments, often predicting

winners or finalists correctly. However, all forecasts are probabilistic, clearly below 100%,

and thus by no means certain.

This showed prominently at the UEFA Euro 2016:

- The model correctly predicted that France would beat Germany in the semifinal.
- For the final, France had a predicted 68.8% probability to beat Portugal, i.e., being expected to win about 2 out of every 3 matches between these two teams.
- But in the actual final Gignac failed to seal the deal in added time and Portugal was able to take the victory in overtime.

This illustrates that small things can often make the decisive difference in football, which is why predictions with high probabilities cannot be made.

Moreover, it is in the very nature of predictions that they can be wrong, otherwise football tournaments would be very boring. The only

forecast that can be made with certainty is that the World Cup will be an exciting tournament that football fans worldwide look forward to.

In addition to this forecast, other interesting approaches will surely also be published in the next days, e.g., using the ideas of Groll, Schauberger, Tutz (2016). Also, Claus Ekstrøm will evaluate and compare predictions for the 2018 FIFA World Cup, see his slides, video, code.

As a final remark: Betting on the outcome based on the results presented here is not recommended. Not only because the winning probabilities are clearly far below 100% but, more importantly, because the bookmakers have a sizeable profit margin of about 15.2% which assures that the best chances of making money based on sports betting lie with them!

## Working paper

Zeileis A, Leitner C, Hornik K (2018). *“Probabilistic Forecasts for the 2018 FIFA World Cup Based on the Bookmaker Consensus Model”*, Working Paper 2018-09, Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Universität Innsbruck. http://EconPapers.RePEc.org/RePEc:inn:wpaper:2018-09

**leave a comment**for the author, please follow the link and comment on their blog:

**Achim Zeileis**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...