ANALYSIS OF A LOW-SCORING SPORT
Recently, our friend and co-author Sid Suri predicted that a soccer match would end 2-1. This got us thinking. Suppose you didn’t say which team would score 2 and which would score 1. And suppose people considered you “close” if you got within one goal of 2-1 (so 1-0, 1-1, 2-1, 3-1 or 2-2 would be close). How often would you be close by guessing 2-1?
We’ll save the answer for the end of this post.
To figure this out, we went to http://www.football-data.co.uk and pulled every English soccer match from 1993 to 2016 (so far), for the following leagues: Premier, Championship, League 1, League 2, Conference, Division 1, Division 2, and Division 3. In all: 52,017 matches.
First, let’s establish that soccer is a low-scoring sport. We see below that a two-goal match is the most likely outcome in soccer. The average number of goals per match is 2.6 (median 2).
This leads to certain low scores (where a score is a high-low pair) being quite common. The graph at top of this post shows that 1-0 (or 0-1, which is the same thing by our definition) is the most common outcome of a football match there is. Nearly 20% of matches end 1-0.
Now to our key question. What score is just one goal away from the most soccer matches’ scores? It’s close, but the winner is 2-1. 52.9% of matches end within one point of that score. So if you can be vague about who you predict will win, just confidently proclaim that it will end 2-1 and you’ll be right most of the time. Same deal for saying 2-0 or 1-0. All three of these scores are within one point of most soccer matches’ outcomes.
One cute math-y thing about this is that some scores are one point away from more possible scores than others. See below. 2-1 is within one point of five scores (including itself), but 2-0 is only within one point of four possible scores. Same with 1-0. To consider why, realize that scores can’t be negative. Despite being neighbors with fewer scores, 2-0 and 1-0 are within one goal of roughly the same number of matches as 2-1 is. Even 1-1 does quite well with just two neighbors beyond itself.
The figure above shows the counts of each of the neighbors of 2-1 (top), 2-0 (middle), and 1-0 (bottom).
Want to play around more? Here’s R code to play around. Thanks to https://twitter.com/hadleywickham for ggplot, dplyr, httr