Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Predicting the outcome of the different teams in the FIFA World Cup has been of great interest to the general public, and predicting the outcome has also attracted quite some attention in the R community. The World Cup has ended and by now, everyone knows that France managed to take home the trophy that slipped through their fingers when they hosted the UEFA Euro 2016 championship. But who won the more important competition of predicting the outcome?

At the European R Users Meeting I gave a presentation about predicting the results of the 2018 FIFA World Cup. My R code was made available and I encouraged others to use it, make improvements and make their own predictions. The code was picked up and used by several other R users and resulted in several interesting posts such as this blog-post from Mango Solutions or this piece in the Australian The Conversation. It’s always interesting to see how different people address the same problem.

Only one other group took up the competition and provided the full prediction code for the World Cup. Thus, the competitors are the prediction by me (based on odds and the Skellam distribution) and a prediction by Groll, Ley, Schauberger, Van Eetvelde. I’ve also added a prediction based on ELO ratings instead of odds/Skellam for a total of 3 predictions.

Evaluating the prediction winner

When comparing different predictions it is not enough to evaluate only how well the winner of the World Cup was predicted. The predictions of the positions of the other teams should count as well. For that purpose we employ the log-loss scoring rule, which is a proper scoring rule, and it has the nice property that it puts extra penalty to a prediction if the prediction is wrong and you are certain about it.

“It’s better to be somewhat wrong than emphatically wrong. Of course it’s always better to be right”

-Andrew Collier

Log-loss is defined as follows for a country $$l$$ that achieves rank $$r$$:

$-\sum_{\text{rank}\; r} \sum_{\text{country}\; l} I(\text{country l achieved rank r})\log(\hat{p}_{rl}),$

where $$\hat{p}_{rl}$$ is the predicted probability that country $$l$$ achieves rank $$r$$, and $$I()$$ is an indicator function.

The log-loss scoring rule ensures that it is not only important to predict which country will achieve a given rank – it is also important that you guess the ranks with as high probability as possible. Small values of log-loss (close to zero) means that the prediction is good while larger values correspond to worse predictions.

However, the World Cup only produces partial rankings since there is only an official rank given to the first 4 teams (gold, silver, bronze, and 4. position), while the remaining 28 teams only get a partial ranking. We know which 4 teams that reached the quarter final but did not continue (positions 5-8), who came to the 8th final (position 9-16) et cetera. The solution to this is to collapse the probabilities into 7 categories: 1st place, 2nd place, 3rd, 4th, 5th-8th, 9th-16th, and 17th-32nd, and compute the log-loss based on the probabilities of ending in one of these 7 groups.

The log-loss therefore becomes

$-\sum_{\text{category}\; k} \sum_{\text{country}\; l} I(\text{country l ended up in category k}) \log(\hat{p}_{kl})$

Another problem with the default log-loss is that all predictions are weighed equally. That makes sense under the prior assumption that all categories are equally important but let’s face it: for a World Cup prediction it really is more important to predict who will come out on top. The log-loss function can be modified by adding a weight, $$w_k$$, so we put more emphasis to correct predictions at the top of the tournament.

$-\sum_{\text{category}\; k} \sum_{\text{country}\; l} w_k \; I(\text{country l ended up in category k}) \log(\hat{p}_{kl})$

One possibility for the weights would be to use $$1, 1, \frac12, \frac12, \frac14, \frac18$$ and $$\frac{1}{16}$$ for category 1, 2, 3, 4, 5-8, 9-16 and 17-32, respectively. This makes it more important to predict the top of the tournament correctly, while there still is some benefits to correctly predicting the lower ranking countries as well. The weights are chosen such that each category has the same weight (with 3rd and 4th place regarded as one common category).

Getting the predictions

Each prediction is a $$32\times32$$ matrix where each columns represents a country and the rows represent the ranks of the teams. The elements of the matrix are the probabilities that a team will obtain the given rank. The predictions are available online as an rda file that contains three matrices, ekstrom1, ekstrom2, and GLSE so it is not necessary to run all the simulations again to get started:

# Load the three predictions
head(GLSE)

      Egypt Russia Saudi Arabia Uruguay   Iran Morocco Portugal  Spain
[1,] 0.0009 0.0005       0.0000  0.0117 0.0002  0.0038   0.0244 0.1776
[2,] 0.0020 0.0030       0.0003  0.0312 0.0013  0.0070   0.0481 0.1069
[3,] 0.0036 0.0044       0.0002  0.0342 0.0021  0.0082   0.0507 0.1264
[4,] 0.0118 0.0154       0.0033  0.0590 0.0059  0.0233   0.0740 0.0622
[5,] 0.0309 0.0328       0.0069  0.1951 0.0286  0.0777   0.1535 0.0763
[6,] 0.0000 0.0000       0.0000  0.0000 0.0000  0.0000   0.0000 0.0000
Australia Denmark France   Peru Argentina Croatia Iceland Nigeria
[1,]    0.0006  0.0167 0.1130 0.0052    0.0715  0.0209  0.0056  0.0008
[2,]    0.0018  0.0289 0.0938 0.0160    0.0823  0.0392  0.0129  0.0026
[3,]    0.0024  0.0325 0.1040 0.0145    0.0827  0.0405  0.0127  0.0022
[4,]    0.0079  0.0431 0.0622 0.0279    0.0627  0.0583  0.0239  0.0071
[5,]    0.0094  0.0550 0.1257 0.0303    0.0488  0.0671  0.0435  0.0184
[6,]    0.0000  0.0000 0.0000 0.0000    0.0000  0.0000  0.0000  0.0000
Brazil Costa Rica Switzerland Serbia Germany South Korea Mexico
[1,] 0.1218     0.0005      0.0207 0.0046  0.1746      0.0003 0.0036
[2,] 0.0947     0.0029      0.0354 0.0115  0.0851      0.0015 0.0080
[3,] 0.0764     0.0033      0.0330 0.0099  0.0912      0.0019 0.0087
[4,] 0.0447     0.0091      0.0409 0.0186  0.0428      0.0041 0.0183
[5,] 0.0000     0.0000      0.0000 0.0000  0.0000      0.0000 0.0000
[6,] 0.1335     0.0162      0.0801 0.0379  0.0501      0.0226 0.0683
Sweden Belgium England Panama Tunisia Colombia  Japan Poland Senegal
[1,] 0.0100  0.1041  0.0730 0.0003  0.0031   0.0214 0.0002 0.0055  0.0029
[2,] 0.0230  0.1053  0.0865 0.0006  0.0060   0.0384 0.0017 0.0146  0.0075
[3,] 0.0187  0.0886  0.0786 0.0006  0.0057   0.0378 0.0014 0.0155  0.0074
[4,] 0.0287  0.0592  0.0695 0.0025  0.0140   0.0485 0.0050 0.0282  0.0179
[5,] 0.0000  0.0000  0.0000 0.0000  0.0000   0.0000 0.0000 0.0000  0.0000
[6,] 0.0849  0.1802  0.1247 0.0047  0.0149   0.0617 0.0188 0.0607  0.0407

We see that the GLSE prediction gave France an 11.3% chance of winning.
The predictions of winning the World Cup are easily compared by plotting the winning probabilites.

DF <- tibble(prob=100*c(GLSE[1,], ekstrom1[1,], ekstrom2[1,]),
country=rep(colnames(GLSE), times=3),
prediction=rep(c("GLSE", "Ekstrom1", "Ekstrom2"), each=32))

library("ggplot2")
library("magrittr")

DF %>%
ggplot(aes(x=reorder(country, -prob), y=prob, col=factor(prediction))) +
geom_point(size=3) +
scale_colour_manual(values = c("black", "#990000", "blue")) +
xlab("Country") + ylab("Prediction probability of winning the 2018 World Cup") + coord_flip()

To compute the log-loss scores we first need to combine the 32 partial ranks into the 7 categories by adding up the probabilities within each category for each country. Some matrix multiplication will do that for us. We set up a matrix that collapses the 32 ranks into the 7 categories (it will be added to a function later).

comb <- rbind(c(1, rep(0, 31)),
c(0, 1, rep(0, 30)),
c(0, 0, 1, rep(0, 29)),
c(0, 0, 0, 1, rep(0, 28)),
c(rep(0, 4), rep(1, 4), rep(0, 24)),
c(rep(0, 8), rep(1, 8), rep(0, 16)),
c(rep(0:1, times=c(16, 16)))
)
comb

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]    1    0    0    0    0    0    0    0    0     0     0     0     0
[2,]    0    1    0    0    0    0    0    0    0     0     0     0     0
[3,]    0    0    1    0    0    0    0    0    0     0     0     0     0
[4,]    0    0    0    1    0    0    0    0    0     0     0     0     0
[5,]    0    0    0    0    1    1    1    1    0     0     0     0     0
[6,]    0    0    0    0    0    0    0    0    1     1     1     1     1
[7,]    0    0    0    0    0    0    0    0    0     0     0     0     0
[,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
[1,]     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0     0     0
[5,]     0     0     0     0     0     0     0     0     0     0     0
[6,]     1     1     1     0     0     0     0     0     0     0     0
[7,]     0     0     0     1     1     1     1     1     1     1     1
[,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
[1,]     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0
[3,]     0     0     0     0     0     0     0     0
[4,]     0     0     0     0     0     0     0     0
[5,]     0     0     0     0     0     0     0     0
[6,]     0     0     0     0     0     0     0     0
[7,]     1     1     1     1     1     1     1     1

comb %*% GLSE

      Egypt Russia Saudi Arabia Uruguay   Iran Morocco Portugal  Spain
[1,] 0.0009 0.0005       0.0000  0.0117 0.0002  0.0038   0.0244 0.1776
[2,] 0.0020 0.0030       0.0003  0.0312 0.0013  0.0070   0.0481 0.1069
[3,] 0.0036 0.0044       0.0002  0.0342 0.0021  0.0082   0.0507 0.1264
[4,] 0.0118 0.0154       0.0033  0.0590 0.0059  0.0233   0.0740 0.0622
[5,] 0.0808 0.0807       0.0221  0.2428 0.0391  0.1122   0.2619 0.2568
[6,] 0.3567 0.4044       0.1445  0.4865 0.0819  0.1544   0.2189 0.1527
[7,] 0.5442 0.4916       0.8296  0.1346 0.8695  0.6911   0.3220 0.1174
Australia Denmark France   Peru Argentina Croatia Iceland Nigeria
[1,]    0.0006  0.0167 0.1130 0.0052    0.0715  0.0209  0.0056  0.0008
[2,]    0.0018  0.0289 0.0938 0.0160    0.0823  0.0392  0.0129  0.0026
[3,]    0.0024  0.0325 0.1040 0.0145    0.0827  0.0405  0.0127  0.0022
[4,]    0.0079  0.0431 0.0622 0.0279    0.0627  0.0583  0.0239  0.0071
[5,]    0.0318  0.1355 0.1899 0.0862    0.2061  0.1469  0.0786  0.0286
[6,]    0.1226  0.3353 0.2895 0.2387    0.3118  0.3495  0.2268  0.1258
[7,]    0.8329  0.4080 0.1476 0.6115    0.1829  0.3447  0.6395  0.8329
Brazil Costa Rica Switzerland Serbia Germany South Korea Mexico
[1,] 0.1218     0.0005      0.0207 0.0046  0.1746      0.0003 0.0036
[2,] 0.0947     0.0029      0.0354 0.0115  0.0851      0.0015 0.0080
[3,] 0.0764     0.0033      0.0330 0.0099  0.0912      0.0019 0.0087
[4,] 0.0447     0.0091      0.0409 0.0186  0.0428      0.0041 0.0183
[5,] 0.1723     0.0483      0.1732 0.0932  0.1897      0.0331 0.1047
[6,] 0.3272     0.1476      0.2876 0.2226  0.2829      0.1428 0.2713
[7,] 0.1629     0.7883      0.4092 0.6396  0.1337      0.8163 0.5854
Sweden Belgium England Panama Tunisia Colombia  Japan Poland Senegal
[1,] 0.0100  0.1041  0.0730 0.0003  0.0031   0.0214 0.0002 0.0055  0.0029
[2,] 0.0230  0.1053  0.0865 0.0006  0.0060   0.0384 0.0017 0.0146  0.0075
[3,] 0.0187  0.0886  0.0786 0.0006  0.0057   0.0378 0.0014 0.0155  0.0074
[4,] 0.0287  0.0592  0.0695 0.0025  0.0140   0.0485 0.0050 0.0282  0.0179
[5,] 0.1370  0.2871  0.2669 0.0202  0.0630   0.1899 0.0321 0.1206  0.0687
[6,] 0.3180  0.2152  0.2227 0.0872  0.1401   0.4574 0.1603 0.4201  0.2970
[7,] 0.4646  0.1405  0.2028 0.8886  0.7681   0.2066 0.7993 0.3955  0.5986

According to the prediction, Germany had around 40% chance of being in the top 4, and France had a 14.76% chance of not making it out of the group matches.

And the winner is …

Next we need to define a function to compute the log-loss. As input we accept a vector, trueranking, which provides information on the true final results from the World Cup, and a $$32\times32$$ matrix of predictions. The ranks will be condensed as described above.

Also, the log-loss scoring function can go bonkers if a prediction is zero. This could happen if either a prediction method sets the probability of an outcome to zero and the event then turned out to happen, or if the probability is so small that none of the simulations gave rise to that particular outcome. To fix that we tweak the minimum predicted probability to be 1/64 – one half the probability that would be used if each team had equal probability of winning the full tournament. This, however, is not a problem with any of the available predictions.

logloss_score <- function(trueranking,
matrix_prediction,
weights=1/c(1, 1, 2, 2,
rep(4, 4),
rep(8, 8),
rep(16, 16)),
threshold=1/64) {

## Minimal sanity checks
stopifnot(length(trueranking)==32)
stopifnot(NROW(matrix_prediction)==32)
stopifnot(NCOL(matrix_prediction)==32)

# Define the matrix used to condense the rankings to
# 7 categories
comb <- rbind(c(1, rep(0, 31)),
c(0, 1, rep(0, 30)),
c(0, 0, 1, rep(0, 29)),
c(0, 0, 0, 1, rep(0, 28)),
c(rep(0, 4), rep(1, 4), rep(0, 24)),
c(rep(0, 8), rep(1, 8), rep(0, 16)),
c(rep(0:1, times=c(16, 16)))
)

# Compute the condensed ranking probabilities
condensed_prediction_matrix <- comb %*% matrix_prediction

# This vector defines the categories corresponding to the
# order of the rankings given in rank_group
rank_group <- c(1, 2, 3, 4, rep(5, 4), rep(6, 8), rep(7, 16))

# Compute the weighted log loss score
sum(log(pmax(
condensed_prediction_matrix[cbind(rank_group, trueranking)],
threshold))*weights)
}

If all countries have exactly the same skill level then their final positions will have a uniform distribution and the weighted log-loss becomes

# Create matrix with equal probability of all positions for all teams
equalskill <- matrix(rep(1/32, 32*32), 32)
logloss_score(1:32, equalskill)

[1] -14.55609

Hopefully, the contestants are doing better than just guessing that all teams are equally good. Finally, we enter the actual outcome of the 2018 World Cup.

trueresult <- c("France", "Croatia", "Belgium", "England",    # 4 best teams
"Russia", "Sweden", "Brazil", "Uruguay",      # 5-8
"Colombia", "Switzerland", "Japan", "Mexico", # 9-
"Denmark", "Spain", "Portugal", "Argentina",  # -16
"Panama", "Tunisia", "Senegal", "Poland",     # 17-
"Costa Rica", "Serbia", "South Korea", "Germany",
"Iceland", "Nigeria", "Australia", "Peru",
"Iran", "Saudi Arabia", "Morocco", "Egypt")   # -32

#  Information about the teams from the function to create the predictions
#  We only use that to get the column indices of the team names
team_data <- tibble(
number = 1:32,
name = c("Egypt","Russia","Saudi Arabia","Uruguay",
"Iran","Morocco","Portugal","Spain",
"Australia","Denmark","France","Peru",
"Argentina","Croatia","Iceland","Nigeria",
"Brazil","Costa Rica","Switzerland","Serbia",
"Germany","South Korea","Mexico","Sweden",
"Belgium","England","Panama","Tunisia",
"Colombia","Japan","Poland","Senegal")
)

# Get the column order of the results
trueranks <- match(trueresult, team_data\$name)

# Compute the logloss and see who did the best

logloss_score(trueranks, GLSE)

[1] -11.69494

logloss_score(trueranks, ekstrom1)

[1] -11.72221

logloss_score(trueranks, ekstrom2)

[1] -13.47818

Tabel 1: Result of FIFA 2018 World Cup Prediction. The winner is the model by Groll et al, and the ELO model pays a penalty by putting too high probability of both Brazil and Germany which did not do so well in the World Cup.

log.loss
Groll, Ley, Schauberger, VanEetvelde -11.69
Ekstrom (Skellam) -11.72
Ekstrom (ELO) -13.48
Random guessing -14.56

And the winner is the prediction by Groll, Ley, Schauberger, VanEetvelde (although not by much). Well done! Time to prepare the prediction algorithms for the next tournament – and hopefully we can get more people to participate.