New Bundesliga Forecasting Tool: Can Underdog Herta Berlin beat Bayern Munich?

[This article was first published on R-Bloggers – Learning Machines, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


The Bundesliga is Germany’s primary football league. It is one of the most important football leagues in the world, broadcast on television in over 200 countries.

If you want to get your hands on a tool to forecast the result of any game (and perform some more statistical analyses), read on!

The basis of our forecasting tool was laid in this blog post: Euro 2020: Will Switzerland kick out Spain too?. There we also explained the methodology. For this post, we adapted the parameters for the Bundesliga (the sources are given in the code below) to forecast the result of the upcoming game Herta BSC (Berlin) against the international top team Bayern Munich on August 28 as an example. The tool can also easily be adapted to other football leagues, e.g. the English Premier League.

On top of that, we made the model even more accurate by adding a home advantage. This effect is surprisingly stable across the main European football leagues at about 0.4 goals extra for the home team. By the way: in times of Corona, when no spectators were allowed in the stadiums, the home advantage disappeared!

Another thing we added is a probability calculation for all possible outcomes. We do this by assuming that the goals scored for each team are independent of each other (it can be discussed whether this is a reasonable assumption) so that all marginal probabilities can just be multiplied. This can easily be done in R with the outer() (product) function (= %o%). The most probable outcome can then easily be extracted:

mean_total_score <-  3.03 # https://de.statista.com/statistik/daten/studie/1622/umfrage/bundesliga-entwicklung-der-durchschnittlich-erzielten-tore-pro-spiel/

# https://www.transfermarkt.de/bundesliga/marktwerteverein/wettbewerb/L1
team1 = "Bayern Munich" ; colour1 <- "red"    ; value1 <- 818.5  # rows
team2 = "Herta BSC"     ; colour2 <- "blue"   ; value2 <- 176.75 # columns

# https://www.saechsische.de/mehr-auswaerts-tore-bei-geisterspielen-5219318.html
ratio <- value1 / (value1 + value2)
mean_goals1 <- ratio * mean_total_score + 0.4 # 0.4 = home advantage
mean_goals2 <- (1 - ratio) * mean_total_score - 0.4

goals <- 0:7
prob_goals1 <- dpois(goals, mean_goals1)
prob_goals2 <- dpois(goals, mean_goals2)

probs <- round((prob_goals1 %o% prob_goals2) * 100, 1) # outer product
colnames(probs) <- rownames(probs) <- goals

parbkp <- par(mfrow=c(1, 2))
max_ylim <- max(prob_goals1, prob_goals2)
plot(goals, prob_goals1, type = "h", ylim = c(0, max_ylim), xlab = team1, ylab = "Probability", col = colour1, lwd = 10)
plot(goals, prob_goals2, type = "h", ylim = c(0, max_ylim), xlab = team2, ylab = "", col = colour2, lwd = 10)
title(paste(team1, paste(goals[which(probs == max(probs), arr.ind = TRUE)], collapse = ":"), team2), line = -2, outer = TRUE)
par(parbkp)

So, the most probable outcome will be Bayern Munich 2:0 Herta BSC. Let us have a look at the probabilities in more detail:

probs
##      0   1   2 3 4 5 6 7
## 0  4.8 0.7 0.0 0 0 0 0 0
## 1 14.0 1.9 0.1 0 0 0 0 0
## 2 20.2 2.8 0.2 0 0 0 0 0
## 3 19.5 2.7 0.2 0 0 0 0 0
## 4 14.1 1.9 0.1 0 0 0 0 0
## 5  8.1 1.1 0.1 0 0 0 0 0
## 6  3.9 0.5 0.0 0 0 0 0 0
## 7  1.6 0.2 0.0 0 0 0 0 0

The number of goals of Bayern Munich is in the rows, Herta BSC is in the columns. The 2:0 result has a probability of over twenty percent, which is quite high. But even a result of 3:0 still has a probability of nearly 20 percent!

To calculate the overall probabilities for a win for each team and a draw we can conveniently use the lower.tri(), upper.tri(), and diag() functions:

sum(probs[lower.tri(probs)]) # probability team 1 wins
## [1] 91

sum(diag(probs)) # probability for a draw
## [1] 6.9

sum(probs[upper.tri(probs)]) # probability team 2 wins
## [1] 0.8

So, to answer the original question, Herta BSC’s chance to beat Bayern Munich is below one percent: they need nothing less than a miracle to win in Munich!


DISCLAIMER
This post is written on an “as is” basis for educational purposes only and comes without any warranty. The findings and interpretations are exclusively those of the author and are not endorsed by or affiliated with any third party.

In particular, this post provides no sports betting advice! No responsibility is taken whatsoever if you lose money.

(If you gain money though I would be happy if you would buy me a coffee… that is not too much to ask, is it? ? )

To leave a comment for the author, please follow the link and comment on their blog: R-Bloggers – Learning Machines.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)