(By Achim Zeileis) After 36 years the FIFA World Cup returns to South America with the 2014 event being hosted in Brazil (after 1978 in Argentina). And as in all previous South American FIFA World Cups, a South American team is expected to take the victory: Using a bookmaker consensus rating – obtained by aggregating winning odds from 22 online bookmakers – the clear favorite is the host Brazil with a forecasted winning probability of 22.5%, followed by three serious contenders. Neighbor country Argentina is the expected runner-up with a winning probability of 15.8% before Germany with 13.4% and Spain with 11.8%. All other competitors have much lower winning probabilities with the “best of the rest” being the “insider tip” Belgium with a predicted 4.8%. Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, predicted pairwise probabilities for each possible game at the FIFA World Cup are obtained along with “survival” probabilities for each team proceeding to the different stages of the tournament. For example, it can be inferred that the most likely final is a match between neighbors Brazil and Argentina (6.5%) with the odds somewhat in favor of Brazil of winning such a final (with a winning probability of 57.8%). All of these forecasts are the result of a bookmaker consensus rating proposed in Leitner, Hornik, and Zeileis (International Journal of Forecasting, 26(3), 471-481, 2010). It was successfully applied to the EURO 2008FIFA World Cup 2010, and EURO 2012. Of course, not all predictions were fully correct (after all the predicted probabilities are always much lower than 100%) but in 2008 the correct final (Germany vs. Spain) was predicted and in 2010 and 2012 Spain was correctly predicted as the winner of the respective tournament. A new working paper about the 2014 FIFA World Cup, upon which this blog post is based, applies the same technique and is introduced here. The core idea is to use the expert knowledge of international bookmakers. These have to judge all possible outcomes in a sports tournament such as the FIFA World Cup and assign odds to them. Doing a poor job (i.e., assigning too high or too low odds) will cost them money. Hence, in our forecasts we solely rely on the expertise of 22 such bookmakers. Specifically, we (1) adjust the quoted odds by removing the bookmakers’ profit margins (or overround, on average 15.0%), (2) aggregate and average these to a consensus rating, and (3) infer the corresponding tournament-draw-adjusted team abilities using the Bradley-Terry model for pairwise comparisons. For step (1), it is assumed that the quoted odds are derived from the underlying “true” odds as: quoted odds = odds · α + 1, where + 1 is the stake (which is to be paid back to the bookmakers’ customers in case they win) and α is the proportion of the bets that is actually paid out by the bookmakers. The so-called overround is the remaining proportion 1 – α and the main basis of the bookmakers’ profits (see also Wikipedia and the links therein). For the 22 bookmakers employed in this analysis, the median overround is a sizeable 15.0%. Subsequently, in step (2), the overround-adjusted odds are transformed to the log-odds (or logit scale), averaged for each team, and transformed back to winning probabilities (displayed in the barchart above). Finally, step (3) of the analysis uses the following idea:
1. If team abilities are available, pairwise winning probabilities can be derived for each possible match using a Bradley-Terry approach.
2. Given pairwise winning probabilities, the whole tournament can be easily simulated to see which team proceeds to which stage in the tournament and which team finally wins.
3. Such a tournament simulation can then be run sufficiently often (here 100,000 times) to obtain relative frequencies for each team winning the tournament.
Using an iterative approach we calibrate the team abilities so that the implied winning probabilities (when simulating the tournament repeatedly) match the bookmaker consensus probabilities (reported above) closely. Thus, we obtain abilities for each team that are adjusted for the tournement draw because the bookmakers’ odds factored this already in (i.e., account for the fact that some teams compete in relatively weak or strong groups respectively). Moreover, these abilities imply winning probabilities for each conceivable match between two teams, reported in the color-coded display below. Light gray signals that either team is almost equally likely to win a match between Teams A and B (probability between 40% and 60%). Light, medium, and dark blue/red corresponds to small, moderate, and high probabilities of winning/losing a match between Team A and Team B. All probabilities are obtained from the Bradley-Terry model using the following equation for the winning probability: Pr(A beats B) = abilityA / (abilityA * abilityB) Additionally, the tournament simulation cannot only be used to infer an estimated probability for the outcome of each individual match but also for the whole course of the tournament. The plot below shows the relative frequencies from the simulation for each team to “survive” over the tournament, i.e., proceed from the group-phase to the round of 16, quarter- and semi-finals, and the final. Clearly, Brazil and Argentina are the clear favorites within their respective groups A and F with almost 100% probability to make it to the round of 16 whereas all remaining teams have much poorer chances to proceed to the later stages of the FIFA World Cup. The next best teams, Germany and Spain, face much harder groups: Germany plays in group G against Portugal while Spain has to prevail against two strong contenders, The Netherlands and Chile. Group D is particularly well-balanced with three former FIFA World Champions all of which have about equal chances to proceed. The remaining groups C, E, and H are also somewhat balanced but not as tight as group D. However, note that even the weakest teams in the tournament have probabilities of about 20% to proceed to the round of 16 indicating that the curves just reflect average expected performance and that surprises are by no means unlikely.Clearly, the bookmakers perceive Brazil to be the strongest team in the tournament with moderate (70-80%) to high (> 80%) probabilities to beat almost any other team in the tournament. The only group of teams that get close to having even chances are Argentina (with probability of 42.2% of beating Brazil), Germany (with 41.3%), and Spain (with 41.2%). Behind these four strongest teams two or three bigger clusters of teams can be seen, each of which are approximately of the same strength (i.e., yielding approximately even chances in a pairwise comparison). Needless to say that all predictions are in probabilities that are far from being certain. While Brazil taking the home victory is the most likely event in the bookmakers’ expert opinions, it is still far more likely that one of the other teams wins. This is one of the two reasons why we would recommend to refrain from placing bets based on our analyses. The more important second reason, though, is that the bookmakers have a sizeable profit margin of (on average) 15.0% which assures that the best chances of making money based on sports betting lie with them. Hence, this should be kept in mind when placing bets. We, ourselves, will not place bets but focus on enjoying the exciting football tournament that the FIFA 2014 World Cup will be with 100% predicted probability! Working paper: Zeileis A, Leitner C, Hornik K (2014). “Home Victory for Brazil in the 2014 FIFA World Cup”, Working Paper 2014-17, Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Universität Innsbruck. URL http://EconPapers.RePEc.org/RePEc:inn:wpaper:2014-17.