(By Achim Zeileis) After 36 years the FIFA World Cup returns to South America with the 2014 event being hosted in Brazil (after 1978 in Argentina). And as in all previous South American FIFA World Cups, a South American team is expected to take the victory: Using a bookmaker consensus rating – obtained by aggregating winning odds from 22 online bookmakers – the clear favorite is the host Brazil with a forecasted winning probability of 22.5%, followed by three serious contenders. Neighbor country Argentina is the expected runner-up with a winning probability of 15.8% before Germany with 13.4% and Spain with 11.8%. All other competitors have much lower winning probabilities with the “best of the rest” being the “insider tip” Belgium with a predicted 4.8%. Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, predicted pairwise probabilities for each possible game at the FIFA World Cup are obtained along with “survival” probabilities for each team proceeding to the different stages of the tournament. For example, it can be inferred that the most likely final is a match between neighbors Brazil and Argentina (6.5%) with the odds somewhat in favor of Brazil of winning such a final (with a winning probability of 57.8%). All of these forecasts are the result of a bookmaker consensus rating proposed in Leitner, Hornik, and Zeileis (International Journal of Forecasting, 26(3), 471-481, 2010). It was successfully applied to the EURO 2008, FIFA World Cup 2010, and EURO 2012. Of course, not all predictions were fully correct (after all the predicted probabilities are always much lower than 100%) but in 2008 the correct final (Germany vs. Spain) was predicted and in 2010 and 2012 Spain was correctly predicted as the winner of the respective tournament. A new working paper about the 2014 FIFA World Cup, upon which this blog post is based, applies the same technique and is introduced here. The core idea is to use the expert knowledge of international bookmakers. These have to judge all possible outcomes in a sports tournament such as the FIFA World Cup and assign odds to them. Doing a poor job (i.e., assigning too high or too low odds) will cost them money. Hence, in our forecasts we solely rely on the expertise of 22 such bookmakers. Specifically, we (1) adjust the quoted odds by removing the bookmakers’ profit margins (or overround, on average 15.0%), (2) aggregate and average these to a consensus rating, and (3) infer the corresponding tournament-draw-adjusted team abilities using the Bradley-Terry model for pairwise comparisons. For step (1), it is assumed that the quoted odds are derived from the underlying “true” odds as: quoted odds = odds · α + 1, where + 1 is the stake (which is to be paid back to the bookmakers’ customers in case they win) and α is the proportion of the bets that is actually paid out by the bookmakers. The so-called overround is the remaining proportion 1 – α and the main basis of the bookmakers’ profits (see also Wikipedia and the links therein). For the 22 bookmakers employed in this analysis, the median overround is a sizeable 15.0%. Subsequently, in step (2), the overround-adjusted odds are transformed to the log-odds (or logit scale), averaged for each team, and transformed back to winning probabilities (displayed in the barchart above). Finally, step (3) of the analysis uses the following idea:
- If team abilities are available, pairwise winning probabilities can be derived for each possible match using a Bradley-Terry approach.
- Given pairwise winning probabilities, the whole tournament can be easily simulated to see which team proceeds to which stage in the tournament and which team finally wins.
- Such a tournament simulation can then be run sufficiently often (here 100,000 times) to obtain relative frequencies for each team winning the tournament.