Site icon R-bloggers

Visualizing the Premier League So Far, Part 1: Overview with xPts Tables and xG Plots

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It’s the November International Break and we’re around a third of the season through in the Premier League. To break the monotony I decided to do a blog post exploring how teams are doing so far according to a variety of metrics such as shots, goals, xG, xPts, etc. both for and against. For this post specifically I talk about some overall graphs and tables based on xG and xPts. Later parts (Parts 2, 3, etc.) will look much more closely at how teams are doing in every 15 minute interval of games, how they’re performing offensively and defensively across different game situations, and more. It may still be a little early to do something like this but I wanted to compare in the future how things were before the hectic holiday schedule set in (and personally, to get more practice writing about this stuff). In any case keep in mind that there’s only been 12 games so far.

Learning from some of my previous soccer-related blog posts ( Exhibit A, Exhibit B ), many of which are lengthier than an A Song of Ice & Fire novel, I’ve decided against doing another blog post that is simultaneously an analysis and a R programming tutorial. I’ve elected to post the R code to get the data and create the viz in a separate file this time. As always all the data is gathered with R, tidied with the {tidyverse}, and the viz are created with {ggplot2} and friends.

You can follow along with the code: here

Anyways, let get started!

xPts (Expected Points) Table

Expected points (expected Points, xPts, etc.) are an extension of the expected Goals (xG) models you might be familiar with if you’ve at least touched the surface of footy analytics. If you’re not familiar with xG then take a quick look at my Copa America blog post or this article from FBref for a primer. Given the expected goals values each team created during a match, one can take these values and simulate it thousands of times to calculate the probabilities that the game will result in a win/draw/loss. From these probabilities (match odds) we can convert them into how many points (anywhere between 0 and 3) that a team could win given the xG they and their opponents accumulated.

League Table by xPts
Points Result Goals Expected Goals
Team (Actual Rank) xPts Points PD W D L For Against GD xG xGA xGD
Manchester City (4) 28.3 25 3.3 8 1 3 35 13 22 36.1 13.5 22.6
Liverpool (1) 23.9 34 -10.1 11 1 0 28 10 18 22.1 11.7 10.4
Chelsea (3) 23.3 26 -2.7 8 2 2 27 17 10 25.2 12.7 12.5
Manchester United (7) 23.2 16 7.2 4 4 4 16 12 4 17.2 11.1 6.1
Everton (15) 20.2 14 6.2 4 2 6 13 18 -5 17.8 12.3 5.5
Wolverhampton Wanderers (8) 17.8 16 1.8 3 7 2 16 15 1 15.0 13.9 1.1
Leicester (2) 17.8 26 -8.2 8 2 2 29 8 21 14.9 11.6 3.3
Burnley (10) 17.7 15 2.7 4 3 5 17 18 -1 16.3 13.6 2.7
Brighton (11) 16.5 15 1.5 4 3 5 15 17 -2 16.7 19.6 -2.9
Watford (18) 16.1 8 8.1 1 5 6 8 23 -15 13.8 18.4 -4.6
Arsenal (6) 15.3 17 -1.7 4 5 3 16 17 -1 15.4 15.4 0.0
Tottenham (14) 15.3 14 1.3 3 5 4 18 17 1 13.2 16.3 -3.1
Bournemouth (9) 14.8 16 -1.2 4 4 4 15 15 0 15.7 18.1 -2.4
Sheffield United (5) 14.8 17 -2.2 4 5 3 13 9 4 14.6 16.4 -1.8
Southampton (19) 13.7 8 5.7 2 2 8 11 29 -18 15.0 21.0 -6.0
Aston Villa (17) 12.7 11 1.7 3 2 7 17 20 -3 15.4 20.9 -5.5
Crystal Palace (12) 12.4 15 -2.6 4 3 5 10 16 -6 9.7 19.4 -9.7
West Ham (16) 11.5 13 -1.5 3 4 5 14 20 -6 15.5 22.3 -6.8
Newcastle United (13) 9.5 15 -5.5 4 3 5 11 18 -7 10.0 20.2 -10.2
Norwich (20) 8.3 7 1.3 2 1 9 11 28 -17 12.7 23.5 -10.8
\* All xG values are non-Penalty xG

At the top of the table, City are clearly in the lead and their expected numbers match up well with their actual numbers, scoring a whopping 35 goals (pretty close to their 36.1 xG). Liverpool are over performing their xG by close to 6 goals, scoring 28 from 22.1 xG and posting the biggest difference between xPts and actual points (10.1) having gained 34 points to lead the Premier League table when expected points had them for 23.9 points. Meanwhile Chelsea might rue their bad luck/opponent’s good finishing ability with having conceded a bit over 4 goals compared to their xGA. In terms of xPts per game, City are gaining 1.9 xPts per game far ahead of their rivals Chelsea and Liverpool (at 1 and 0.9 xPts per game respectively).

The biggest difference between a club’s real rank and their xPts rank is Everton who given their xG might be quite a lot higher up the table than their current lowly 15th place in reality. Another struggling club, Watford, could also be quite comfortable in mid-table if results have gone the way their xG performance may have merited (especially in attack, they’ve been awful at finishing their chances, only 8 goals from 13.8 xG).

The biggest movers downwards from the actual league table based on xPts are newly promoted Sheffield United who tumble down from 5th to 14th. Most of this may stem from the fact that they have conceded a lot less goals compared to what their xGA might suggest. This difference between xGA and goals conceded should catch up to them eventually and although their 5th place position flatters them a bit, in my opinion they’ve been doing well enough that they’ll end up in mid-table very comfortably. They certainly are an interesting team tactically that you might want to read up on (Edit: As I was typing this blog post StatsBomb came out with an article regarding the Blades).

Leicester’s actual numbers (29 goals scored, second most in the league), while very good, do have to come with the caveat that a lot of their goals scored have come when against ten men (13/29 goals or 44%). They are around mid-table in terms of xG per game (see next few graphs below) and they have been over performing their xG by a massive 14 goals so we would expect the numbers to drop closer to their xG values sometime soon (especially Jamie Vardy, who’s scored 11 goals from just 4.67 xG according to infogol). However their good defensive record should keep them in fighting shape for the Champions League spots (especially as the other traditionally “top” teams look mediocre this season). If their numbers do start declining it may be an interesting analysis to drill deeper to figure out why as the numbers in the above table only give a very broad overview of performance.

non-Penalty xGD (Expected Goal Difference) & xG per Game

Goal difference as a stat is known to any fan and also used as a tie-breaker for when teams have a similar amount of points in a league table. However, as we know from the various “expected” stats looking at something like goal difference only looks at the end results and not the actual quality of chances a team created/conceded during games. We’ve seen many team score lucky goals and/or concede under very unlucky circumstances so looking at a team’s expected goal differential can provide a better understanding as to which teams are performing well overall (offensively and defensively).

City are well ahead of the pack while Arsenal are about even so far, around 0.015 xGD to be more precise. Like in the xPts league table, Everton are doing fairly well despite what their actual results might suggest along with Manchester United.

Norwich, Newcastle, and Crystal Palace are the three worst teams mainly due to their atrocious attacking numbers. On the defensive side Newcastle and Crystal Palace are slightly better (but not by much) while Norwich are leaking chances on their end of the pitch. Although the Canaries won praise for their adventurous style at the start of the season, a huge injury crisis has brought them to heel having conceded 28 goals (second worst in the league just ahead of fellow strugglers Southampton by 1 goal) from 23.5 xGA (also worst in the league).

Spurs are in the bottom half on xGD, mainly due to their awful attacking numbers as they are fourth from the bottom for xG per game! Their attacking numbers have steadily declined compared to two seasons ago and last season’s Champions League final run slightly masked their poor domestic record since the start of 2019 (11 Wins, 7 Draws, 12 Losses).

Match Reports

If you follow me on Twitter you might’ve seen that I do xG timeline and shot maps for certain games. I mainly do one for Liverpool as an ongoing thread that I update every match day but I also do them for other games (mainly Bundesliga) that I was able to watch on that weekend as well. You can find the Liverpool matchday thread here and you can find out how I create these by checking out my StatsBomb Messi Data Viz blog post from a few months back.

Here’s an example:

In recent games I also started taking notes during the match and then posting an edited version under the report viz. Basically, a rough bullet-point summary of what I saw, some random thoughts, and trying to find patterns among my match notes. Here’s an example from the Liverpool vs. City game:

It’s weird, I’ve been watching all my life but never bothered taking notes until I started doing all this data analysis stuff a year or so ago. The closest I’ve come before was doing some self-analysis on my own footage as well as a school project where I did play-by-play commentary on a local game! It’s important to watch and learn from the game as well as pouring over the stats so it’s a nice weekly exercise for me.

In terms of tactical stuff (rather than stats) you should check out
spielverlagerung.com, zonalmarking.net, betweentheposts.net, and many more (the first two were sites I used to read occasionally when I was a kid, recommended by one of my coaches I think, and recently took up reading again). Of course, there are tons of people on Twitter you might want to follow as well (ask me or take a look at the people I follow).

Conclusion & Preview for Part 2

We are still only 12 games in and with the hectic holiday schedule (especially Liverpool with the Club World Cup…) coming up a lot can change in even a month from now. This was just a short appetizer for what I have in store for this data that I got from understat.com and I’ll be drilling deeper to investigate some of the numbers that came up above in the next few blog posts.

As mentioned, the next part will take a look at team stats based on different game situations like from open play, free kicks, and corners. Here’s a sample (still very much a work-in-progress):

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.