**R – Statistical Odds & Ends**, and kindly contributed to R-bloggers)

With the FIFA World Cup in my recent memory and the English Premier League (EPL) kicking off this Friday (see here for match schedules), I’ve been thinking a bit about the mathematics/statistics of the beautiful game. In this post, I want to answer the following question: *Is soccer more a game of chance, or a game of skill?*

I’m interested in this because I want to get a sense of how much randomness is inherent in the game of soccer. Being more precise: in a perfect game of chance (e.g. coin flipping), the better team will beat the weaker team exactly 50% of the time, since there is no element of skill. At the other extreme, in a perfect game of skill, the better team will beat the weaker team 100% of the time.

Where does soccer lie on this continuum? Anyone who’s watched a game of soccer knows that (i) the team which scores more goals wins, and (ii) **goals are rare!** With the outcome of the game hinging on just a few events, my initial guess was that soccer might be closer to the “game of chance” end than teams would like us to believe.

At this point you might raise the question: * what does it mean for a team to be better anyway?* For this post, I will take a team’s ranking at the end of the season as a measure of its quality: a team with a higher ranking is deemed better than a team with a lower ranking. I will also be assuming that the team’s quality is constant throughout the season.

For this analysis, I looked at EPL data for 9 seasons, beginning from 2008-2009 to 2016-2017. (One of my data sources did not have 2017-2018 data yet.) I used data from 2 sources:

- england.rda from jalapic’s engsoccerdata repository. This is a real treasure trove of english football results, containing statistics for matches for the top 4 tiers of English football all the way back to 1888!
- Standings from this Google sheet. Again, another treasure trove of data! Unfortunately, the owner has disabled the ability to download or copy the data, so I had to record them manually.

Now for the analysis. (R code for the analysis can be found here.) In turns out that in these 9 seasons, the better-ranked team won **53.8%** of the time, and won or drew **79.2%** of the time. These figures are fairly stable across seasons, as we can see in the figure below:

How do we compare this with the sliding scale of 50% win for games of chance vs. 100% for games of skill? Well… we can’t! At least not directly. There is the issue of how to deal with draws: our sliding scale assumes that the outcome of the game is either a W or an L.

There are 2 ways we can fix this issue. The first is to **compare the EPL results with a modified sliding scale**, where the probability of a draw is equal to the proportion of EPL games that end in a draw. As an example: if 50% of games end in a draw, then a game of chance the better team will win 25% of the time, and win/draw 75% of the time. For a game of skill, the better team will win 50% of the time, and win/draw 100% of the time.

With this, we can update the baseline in the figure above (dashed line below is for game of chance, dashed line above is for game of skill):

The second way to fix this issue is to **do the analysis conditional on the game outcome being a win or a loss**. (This is tantamount to throwing away games which end in a draw.) If we do this, then our original sliding scale (chance 50% skill 100%) applies. The figure shows the results of the conditional analysis:

So, is soccer closer to a game of chance or a game of skill? Draw your own conclusions!

[**Note 1:** This is a crude, first-pass model that does not capture more complex ideas. For example, we implicitly assume that all that matters in determining a better team’s win percentage is the fact that it is better. This is overly simplistic: a better team is going to win much more often if it is playing a vastly weaker opponent compared to playing an opponent that is just slightly weaker.]

**Note 2:** This analysis was done with just EPL data. It would be interesting to see if we get similar results for leagues in other countries.]

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Statistical Odds & Ends**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...