Analysing the effectiveness of tennis tournament seeding by @ellis2013nz
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So, an exploration of how tennis tournament seeding and bracketing impacts on the end result has percolated to the top of my to-do list, inspired by the Melbourne Open currently in play.
This is the first of what will probably be two posts on this topic. Today I wanted to look at the impact of seeding on the chance of the best players finishing first, or in the top 2, 4 or 8 players (ie the grand, semi or quarter finals). I compared the results of simulated single-elimination tournaments between these players in two different tournament types – completely random allocation of the draw, or seeding for the top 32 players as was standard in Grand Slam tournaments, apparently until 2019.
There’s a material impact from the approach to the draw, which can be seen in these results:
This means, for example, that with 32 top players given special “seed” allocations in the draw, there is a 22% chance that the semi finals will include the four best players; but only a 2% chance if the draw is allocated completely at random. The chance of seeing the eight best players in the quarter finals under random allocation is effectively zero, whereas with a seeded draw this will happen 3% of the time (still not very often – which should be no surprise for tennis fans). On the other hand, the chance of seeing the top two seeds playing out the final is 42% in the seeded contest and only 20% when seeded.
Note that these results will depend upon the distribution of players’ strengths in a given tournament. At an extreme, if the top player completely dominates all others, seeding will make no difference to the winner. We can calculate some other theoretical values as limits. For example, if the top two players were effectively invincible when playing anyone except eachother, they would meet in the final 100% of the time in a seeded competition, and 50% of the time if the draw was unseeded. So anything below those levels (42% and 20% in our case) indicates the gap between the two players being “really good” compared to the rest of the field and “infinitely good”.
Here is a blog post on a closely related point – fellow-Melbournian Stefanie Kovalchik on the impact of a suggested move from 32 to 16 seeds for grand slam tournaments. I think that this proposal didn’t go ahead. Kovalchik shows that it would have lead to somewhat less fair results than the 32 seed system, where “fair” is defined as players reaching the round that would be expected based on their ranking.
Historical tennis results data
To perform my analysis I used data on the actual relative strength of players from Stephanie Kovalchik’s {deuce} R package. I wanted a realistic range of strengths that could go head-to-head in a real life tournament, so I chose a single point of time rather than a hypothetical cross-history match up that (for example) would pitch Margaret Court versus Serena Williams. To avoid confusion with contemporary reality, I chose 1990 as my year. Here are the top 10 womens’ tennis players at the end of 1990, as judged by their Elo ratings:
# A tibble: 128 x 3
player_id player_name elo
<int> <chr> <dbl>
1 200414 Steffi Graf 2608.
2 200293 Martina Navratilova 2474.
3 200652 Monica Seles 2388.
4 200572 Gabriela Sabatini 2312.
5 200597 Mary Joe Fernandez 2236.
6 200017 Arantxa Sanchez Vicario 2195.
7 200049 Conchita Martinez 2194.
8 200077 Jennifer Capriati 2169.
9 200401 Manuela Maleeva Fragniere 2151.
10 200404 Zina Garrison 2139.
This is the top 10 rows of a 128 row dataframe women128
created with the chunk of R code below:
Kovalchik provides Elo ratings at point in time of each player, ultimately derived from analysis of data collected by Jeff Sackman. I’ve written about Elo ratings earlier in this blog in the context of backgammon and Australian rules football. They are a powerful method of calculating ratings based on actual performance, with the great advantage of being convertable into probabilities for any hypothetical match up. Incremental adjustments are made to the rating based on how actuality relates to the probabilities derived from Elo ratings going in to the match. This makes them a useful and self-correcting metric that is readily incorporated into a statistical model.
To be sure I am using the correct basis for converting Kovalchik’s Elo ratings into probabilities, I’m going to use her elo_prediction()
function for estimating the chance of either of a pairing winning. To illustrate, here is a chart showing the chance of a selection of the players ranked 2 to 128 by Elo rating of beating Steffi Graf, the highest rated player in our subset of the data (and in fact, the highest rated player by this method in the data available, beginning in 1968 with the start of the open era).
Here’s the code for that illustration:
Simulating tournament draws and results
For the grunt work of simulating tournaments between these 128 players, I first write a function simulate_tournament()
which takes as its main argument a data frame of 128 rows in sequence to represent position in the draw. The input to this function is going to look like this:
> brackets
# A tibble: 896 x 5
player_id round match player_name elo
<int> <dbl> <int> <chr> <dbl>
1 200494 128 1 Dianne Van Rensburg 1855.
2 200423 128 1 Carling Bassett Seguso 1822.
3 200481 128 2 Elna Reinach 1777.
4 200506 128 2 Wiltrud Probst 1768.
5 200699 128 3 Meredith Mcgrath 1857.
6 200086 128 3 Magdalena Maleeva 1825.
7 200419 128 4 Kathy Rinaldi Stunkel 1804.
8 200624 128 4 Tami Whitlinger Jones 1647.
9 200395 128 5 Catherine Tanvier 1732.
10 200360 128 5 Pam Shriver 2047.
...
This indicates (for example) that in the “round of 128” – the first round played – Van Rensburg will play Seguso in match one. If we filter the object to match one of round two (the “round of 64) we see:
> filter(brackets, round == 64 & match ==1)
# A tibble: 4 x 5
player_id round match player_name elo
<int> <dbl> <int> <chr> <dbl>
1 200494 64 1 Dianne Van Rensburg 1855.
2 200423 64 1 Carling Bassett Seguso 1822.
3 200481 64 1 Elna Reinach 1777.
4 200506 64 1 Wiltrud Probst 1768.
There are now four players in match 1. However, one of Van Resnburg or Seguso will have lost in the round of 128; and so will one of Reinach and Probst. With a bit of care, this object brackets
contains the full draw of the elimination tournament, and can be constructed so the top 32 seeds are allocated to the brackets required in a 32 seed draw. The remainder of the code in the chunk below does this for my two different methods of draws. It’s a bit clunky but it seems to work.
Which gets us to our results.
The distribution of the winner of the whole tournament is summarised in the table below. Unsurprisingly, Ms Graf wins more of our simulated tournaments than any other player; 5,454/10,000 when the draw is at random and 5,913/10,000 when we use the 32-seed draw method. But even in this period of Graf’s dominance in women’s tennis, other players (and more than just the top few) have a non-zero chance of winning. Which is why people watch, of course. Interestingly, once we get to Monica Seles (ranked 3 by Elo at this point in time) and lower, players are more likely to win the overall tournament with an unseeded rather than a seeded draw.
winner_name | Actual ranking | No seeding | Seeded (32 seeds) |
---|---|---|---|
Steffi Graf | 1 | 5454 | 5913 |
Martina Navratilova | 2 | 2089 | 2219 |
Monica Seles | 3 | 1064 | 1018 |
Gabriela Sabatini | 4 | 510 | 418 |
Mary Joe Fernandez | 5 | 213 | 132 |
Conchita Martinez | 7 | 110 | 66 |
Arantxa Sanchez Vicario | 6 | 113 | 63 |
Jennifer Capriati | 8 | 89 | 42 |
Manuela Maleeva Fragniere | 9 | 66 | 34 |
Zina Garrison | 10 | 58 | 26 |
This is also the point at which we can interrogate our simulation results to get the image I started the blog with:
This analysis of the simulation results was made with this chunk of code
That’s all for today. Some time soon I hope to come back to this and compare both methods to the alternative proposed by Charles Dodgson Lawn Tennis Tournaments. The True Method of Assigning Prizes with a Proof of the Fallacy of the Present Method. Dodgson, who as well as being a mathematician and logician found time to invent one of the most recognised characters in English literature, was writing at a time before seeding the draw was common and proposed an alternative to the single elimination tournament that in his view was guaranteed to give the correct first three prizes to the three best players. However, he did depend on a non-probabilistic view of what “best” means which is worth probing. But that’s for another day.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.