Machine learning meets reality: Forecast evaluation for the 2026 FIFA World Cup
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After all 72 matches of the group stage in the 2026 FIFA World Cup our probabilistic forecasts are evaluated, revealing what the machine learning algorithm predicted well and where it struggled.
A challenging new tournament format
A couple of days ago the group stage of the 2026 FIFA World Cup was wrapped up after squeezing 72 matches into just a little bit more than two weeks. Thus, all pairings for the Round of 32 are fixed now. Today we want to assess the quality of our own probabilistic forecast for the 2026 FIFA World Cup based on an ensemble machine learning algorithm that we have published prior to the tournament.
Most of our predictions worked reasonably well and the corresponding results are within the limits of expected random variation. It turned out, though, that the switch from 32 to 48 teams in the tournament was not only challenging for the audience but also for the machine learning algorithm. There were many more matches between very unequal teams compared to earlier editions of the World Cup (i.e., the training data for the algorithm). Also, due to 8 out of 12 third-ranked teams also proceeding to the knockout stage, it often was more important for the teams not to lose a match (rather than to actually win it), thus favoring many draws. Finally, due to the many possibilities of assigning the third-ranked teams to the knockout matches, some teams profited more than others from the realized tournament draw in the Round of 32.
TL;DR
All tournament favorites proceeded to the Round of 32 and mostly the weaker teams dropped out of the tournament. Arguably the biggest surprises were the African teams (especially South Africa, Cape Verde, and DR Congo) who all “survived” the group stage.
While the predicted win/loss probabilities mostly conformed with the observed results, the predicted goal differences tended to be too low. Especially for matches between rather unequal teams the observed goal differences were often more extreme than expected by the algorithm. The likely reason for this is that there were many more weak teams in this tournament compared to earlier years due to the extension to 48 teams.
There were also somewhat more draws than expected (and fewer wins/losses with a margin of only one goal). Again, this is likely due to the new tournament format with 48 teams. One win and one draw was most sufficient to be among the best third-ranked teams who also proceed to the knockout stage. Also, those groups playing their matches last could behave more strategically and could try to settle for a draw. A fact which was painfully obvious in the memorable match between Algeria and Austria.
Group stage results
First, we look at the results in terms of which teams successfully advanced from the group stage to the Round of 32. The barplots below show the predicted probability for all teams to proceed to the Round of 32, in the observed ranking order, with the color highlighting which teams advanced to the knockout stage.

Clearly, all group favorites made the cut and mostly teams with lower probabilities dropped out. The biggest suprises were some of the African teams, notably South Africa (in Group A), Cape Verde (in Group H), and DR Congo (in Group K), all of which successfully “survived” the group stage. Moreover, although some of the tournament favorites (such as Spain, England, Germany, or Portugal) did not fully convince in their respective group stage matches, these performances did not have many negative consequences, yet. All of them proceeded to the knockout stage, typically still taking the top spot in their respective groups.
Match results
Next, we take a closer look at the 72 individual group-stage matches to check how well our forecasts conformed with the actual outcome. The stacked bar plot below groups all match results into five intervals (columns) based on their predicted goal difference for the stronger vs. the weaker team.

The first column summarizes 15 matches where both teams were predicted to be almost equally strong. More precisely, the stronger team was predicted to be only slightly better, with 0 to 0.35 more predicted goals on average. One third of these matches was won by the slightly better team, one third was lost, and another third ended in a draw. In short, the distribution of the outcomes conforms very well with the prediction that both teams would be essentially equally strong.
In the second and third column the predicted advantages of the stronger team increased to 0.35-0.7 goals and 0.7-1.05 goals, respectively, and also the empirical proportion of matches won increased accordingly.
However, in the last two columns with the most pronounced predicted advantages (goal difference of 1.05-1.4 and 1.4-1.75, respectively) the winning proportion remained high but did not increase further. Also, the proportion of draws remained relatively high, even in matches with a clear favorite.
This suggests that our probabilistic forecasts captured the actual outcomes better in matches with small to moderate differences between the teams. But it seems that the algorithm struggled a little bit in matches with very large predicted differences.
To explore this in more detail, we compare the observed goal differences in these matches with the expected distributions based on the Poisson model employed. This is brought out graphically by so-called hanging rootograms, separately for the low predicted goal differences (0-0.7, first two columns above) and the high ones (1.05-1.75, last two columns above).

In both panels, the red line shows the square root of the expected frequencies while the “hanging” gray bars represent the square root of the observed frequencies.
For the low difference subset in the panel on the left, the observed and expected distributions conform reasonably well. It is noticeable, though, that draws (goal differences of 0) are slightly overrepresented in the observations compared to the predictions.
However, for the high difference subset it is clear that there is a bias in goal difference predictions: Low observed goal differences are underrepresented whereas high observed goal differences are overrepresented. The overrepresentation of draws is also more pronounced in this subset.
As explained above, it is likely that these deviations are due to the new tournament format with 48 teams. Many more matches between extremely different teams occurred in this tournament compared to earlier tournaments with only few very weak teams. The machine learning algorithm apparently has not fully captured this. Similarly, the incentives for winning each match were not as strong as in previous tournaments because 8 out of 12 third-ranked teams also proceeded to the knockout stage.
Updated knockout stage predictions
Finally, we want to look ahead and explore how the realized tournament draw based on the group stage results changes the predicted winning probabilities for the 2026 FIFA World Cup. We do so under the assumption that all results so far are within the range of random variation and that we do not need to adapt the predictions for all possible matches. In other words, the simulation is based on the expectation that especially the top favorites Spain and England can still reach their full potential in the upcoming matches.
As for our original prediction, we simulate the knockout stage 100,000 times and then compute by how many percentage points the winning probabilities change.

This shows that Argentina and England profited most from the realized tournament draw. They are both in the arm of the tournament with fewer strong teams and they can only face each other in the semi-final. Therefore, Argentina’s winning probability increased by 3.1 percentage points (from 8.2% to 11.3%). Similarly, England’s winning probability increased by 2.6 percentage points (from 12.4% to 15.0%). Recall that these numbers are derived under the assumption that all teams will play according to the expectations from before the start of the tournament. Thus, additionally, one might want to factor in that Argentina played even stronger than expected and England somewhat weaker.
The teams who suffer most from the realized tournament draw include top favorites Spain and France along with Portugal and Germany because these are very likely to meet already in the Round of 16 (Spain vs. Portugal and France vs. Germany, respectively). Thus, these are much more difficult obstacles on the way to the World Cup Final compared to those for Argentina and England in the other arm of the tournament.
In any case, the most exciting part of the 2026 FIFA World Cup is only starting now and we can all be curious what is going to happen. There are still 32 teams in the race for the title! (Well, 31 after Canada has defeated South Africa in the first knockout match yesterday.)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.