Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So, an exploration of how tennis tournament seeding and bracketing impacts on the end result has percolated to the top of my to-do list, inspired by the Melbourne Open currently in play.

This is the first of what will probably be two posts on this topic. Today I wanted to look at the impact of seeding on the chance of the best players finishing first, or in the top 2, 4 or 8 players (ie the grand, semi or quarter finals). I compared the results of simulated single-elimination tournaments between these players in two different tournament types – completely random allocation of the draw, or seeding for the top 32 players as was standard in Grand Slam tournaments, apparently until 2019.

There’s a material impact from the approach to the draw, which can be seen in these results:

This means, for example, that with 32 top players given special “seed” allocations in the draw, there is a 22% chance that the semi finals will include the four best players; but only a 2% chance if the draw is allocated completely at random. The chance of seeing the eight best players in the quarter finals under random allocation is effectively zero, whereas with a seeded draw this will happen 3% of the time (still not very often – which should be no surprise for tennis fans). On the other hand, the chance of seeing the top two seeds playing out the final is 42% in the seeded contest and only 20% when seeded.

Note that these results will depend upon the distribution of players’ strengths in a given tournament. At an extreme, if the top player completely dominates all others, seeding will make no difference to the winner. We can calculate some other theoretical values as limits. For example, if the top two players were effectively invincible when playing anyone except eachother, they would meet in the final 100% of the time in a seeded competition, and 50% of the time if the draw was unseeded. So anything below those levels (42% and 20% in our case) indicates the gap between the two players being “really good” compared to the rest of the field and “infinitely good”.

Here is a blog post on a closely related point – fellow-Melbournian Stefanie Kovalchik on the impact of a suggested move from 32 to 16 seeds for grand slam tournaments. I think that this proposal didn’t go ahead. Kovalchik shows that it would have lead to somewhat less fair results than the 32 seed system, where “fair” is defined as players reaching the round that would be expected based on their ranking.

## Historical tennis results data

To perform my analysis I used data on the actual relative strength of players from Stephanie Kovalchik’s {deuce} R package. I wanted a realistic range of strengths that could go head-to-head in a real life tournament, so I chose a single point of time rather than a hypothetical cross-history match up that (for example) would pitch Margaret Court versus Serena Williams. To avoid confusion with contemporary reality, I chose 1990 as my year. Here are the top 10 womens’ tennis players at the end of 1990, as judged by their Elo ratings:

# A tibble: 128 x 3
player_id player_name                 elo
<int> <chr>                     <dbl>
1    200414 Steffi Graf               2608.
2    200293 Martina Navratilova       2474.
3    200652 Monica Seles              2388.
4    200572 Gabriela Sabatini         2312.
5    200597 Mary Joe Fernandez        2236.
6    200017 Arantxa Sanchez Vicario   2195.
7    200049 Conchita Martinez         2194.
8    200077 Jennifer Capriati         2169.
9    200401 Manuela Maleeva Fragniere 2151.
10    200404 Zina Garrison             2139.


This is the top 10 rows of a 128 row dataframe women128 created with the chunk of R code below:

Kovalchik provides Elo ratings at point in time of each player, ultimately derived from analysis of data collected by Jeff Sackman. I’ve written about Elo ratings earlier in this blog in the context of backgammon and Australian rules football. They are a powerful method of calculating ratings based on actual performance, with the great advantage of being convertable into probabilities for any hypothetical match up. Incremental adjustments are made to the rating based on how actuality relates to the probabilities derived from Elo ratings going in to the match. This makes them a useful and self-correcting metric that is readily incorporated into a statistical model.

To be sure I am using the correct basis for converting Kovalchik’s Elo ratings into probabilities, I’m going to use her elo_prediction() function for estimating the chance of either of a pairing winning. To illustrate, here is a chart showing the chance of a selection of the players ranked 2 to 128 by Elo rating of beating Steffi Graf, the highest rated player in our subset of the data (and in fact, the highest rated player by this method in the data available, beginning in 1968 with the start of the open era).

Here’s the code for that illustration:

## Simulating tournament draws and results

For the grunt work of simulating tournaments between these 128 players, I first write a function simulate_tournament() which takes as its main argument a data frame of 128 rows in sequence to represent position in the draw. The input to this function is going to look like this:

> brackets
# A tibble: 896 x 5
player_id round match player_name              elo
<int> <dbl> <int> <chr>                  <dbl>
1    200494   128     1 Dianne Van Rensburg    1855.
2    200423   128     1 Carling Bassett Seguso 1822.
3    200481   128     2 Elna Reinach           1777.
4    200506   128     2 Wiltrud Probst         1768.
5    200699   128     3 Meredith Mcgrath       1857.
6    200086   128     3 Magdalena Maleeva      1825.
7    200419   128     4 Kathy Rinaldi Stunkel  1804.
8    200624   128     4 Tami Whitlinger Jones  1647.
9    200395   128     5 Catherine Tanvier      1732.
10    200360   128     5 Pam Shriver            2047.
...


This indicates (for example) that in the “round of 128” – the first round played – Van Rensburg will play Seguso in match one. If we filter the object to match one of round two (the “round of 64) we see:

> filter(brackets, round == 64 & match ==1)
# A tibble: 4 x 5
player_id round match player_name              elo
<int> <dbl> <int> <chr>                  <dbl>
1    200494    64     1 Dianne Van Rensburg    1855.
2    200423    64     1 Carling Bassett Seguso 1822.
3    200481    64     1 Elna Reinach           1777.
4    200506    64     1 Wiltrud Probst         1768.


There are now four players in match 1. However, one of Van Resnburg or Seguso will have lost in the round of 128; and so will one of Reinach and Probst. With a bit of care, this object brackets contains the full draw of the elimination tournament, and can be constructed so the top 32 seeds are allocated to the brackets required in a 32 seed draw. The remainder of the code in the chunk below does this for my two different methods of draws. It’s a bit clunky but it seems to work.

Which gets us to our results.

The distribution of the winner of the whole tournament is summarised in the table below. Unsurprisingly, Ms Graf wins more of our simulated tournaments than any other player; 5,454/10,000 when the draw is at random and 5,913/10,000 when we use the 32-seed draw method. But even in this period of Graf’s dominance in women’s tennis, other players (and more than just the top few) have a non-zero chance of winning. Which is why people watch, of course. Interestingly, once we get to Monica Seles (ranked 3 by Elo at this point in time) and lower, players are more likely to win the overall tournament with an unseeded rather than a seeded draw.

winner_name Actual ranking No seeding Seeded (32 seeds)
Steffi Graf 1 5454 5913
Martina Navratilova 2 2089 2219
Monica Seles 3 1064 1018
Gabriela Sabatini 4 510 418
Mary Joe Fernandez 5 213 132
Conchita Martinez 7 110 66
Arantxa Sanchez Vicario 6 113 63
Jennifer Capriati 8 89 42
Manuela Maleeva Fragniere 9 66 34
Zina Garrison 10 58 26

This is also the point at which we can interrogate our simulation results to get the image I started the blog with:

This analysis of the simulation results was made with this chunk of code

That’s all for today. Some time soon I hope to come back to this and compare both methods to the alternative proposed by Charles Dodgson Lawn Tennis Tournaments. The True Method of Assigning Prizes with a Proof of the Fallacy of the Present Method. Dodgson, who as well as being a mathematician and logician found time to invent one of the most recognised characters in English literature, was writing at a time before seeding the draw was common and proposed an alternative to the single elimination tournament that in his view was guaranteed to give the correct first three prizes to the three best players. However, he did depend on a non-probabilistic view of what “best” means which is worth probing. But that’s for another day.