F1 Drivers Rated

[This article was first published on Sport Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hello, welcome to today’s blog and in it I’m going to be developing methods to evaluate F1 drivers. Currently there is no real way to tell if an F1 driver is any good. It seems sort of arbitrary how a racing driver is decided if they are good or not. Being a data fan I think there is much more that can be done to rate racing drivers and F1 drivers in particular. One area I think can be quantified is where the driver finishes the race compared to where they started on the grid

The idea is that historical data of where the average generic driver has finished compared to the their grid slot. So if the driver qualifies 2nd and the average driver who has qualified 2nd historically has finished on average 3.4 and this driver finishes 1st then that would be worth 2.4. This number can either be averaged over a long term or in the short term could be used a plus/minus statistic which could be used in the broadcasting of F1.

The Data

I often go beack to this but kaggle is one of the best soruces of data for whatever you want to look at. For this there is a whole F1 dat set covering all sorts of information. All I need for this one is the results and races. The results has each grand prix result for each driver as well as the qualifying position they started in

First thing to look at is the average finsihing position by qualifying position

Overall if you are starting in the top 8 on average you are going backwards in the race. 9th and downwards on average you are finishing higher in the race. However, I wonder how much of the lower starters are effected by retirements. If you start last then all of the retirements are going to be in front of you and you will always move forward. Also if you start from pole all retirements will be behind you and you can only stay in the same place or go backwards. Hence why on average the finishing position is lower then starting for first. The first thing I need to do is control for retirements so everyone is on a level playing field.

Now I can see percent reitrements by their finishing postion on the grid. Clearly the worse cars towards the back of the grid have a higher retirement rate and in our calculation of the KPI I can use that to normalise the results.

After running the model the first time this is the list of the best drivers by there average position change over their career looking at drivers since 2000. I think there must be an error here as I don’t think, with all due respect, Alex Yoong and Enrique Bernoldi are the best drivers to have graced the F1 grid. FYI as well the Verstappen you can see there in 9th is not Max its his dad Jos who was know where near as good as Max.

The error was I was creating the adjusted position from the grid position not the qualifying position. Making that change and creating the same graph shows this:

Now that’s more like it , these are the top 29 drivers by finishing position and there are a lot of pretty big names on it. Including all the world champions in the last 20 years. There are also some interesting names who people maybe wouldn’t instantly think of such as Kobayashi and Friesacher.

When a driver xP is compared to the number of races they competed in, you can clearly see drivers with better ratings do more races and some of the drivers with the highest ratings are world champions.

Lets focus on on a couple of drivers first the current World Champion Lewis Hamilton and the second Niko Hulkenburg.

Hamilton’s performances over the years seems two have 2 distinct periods, the early years where he was at Mclaren the field was a lot closer and he was rarely in a dominant car. Then when it moves into the hybrid era his total significantly increases, partly due to having a more dominant car and maybe worse reliability meaning he started lower on the grid. A more dominant car means if you start lower you gain more positions on those races. This is maybe a limitation for the metric and going forward I possibly have to control for how inherently fast the car is.

Hulkenberg’s career up to 2017 was a bit of a mixed bag. Overall in these 8 seasons he only has 2 seasons with strong positive position differences. Four of them are strong negatives. I chose Hulkenberg because hes the driver with the most race starts without a podium and looking at this record you can possibly see why. This rating isn’t obviously the be all and end all of a drivers career but its a way to try and understand who are the good and bad ones.

This is just the first exploration into a way of measuring F1 drivers performances. There are probably other measures that can also be used to gain a wider picture of how good an F1 driver truly is. I think i can also further improve the model by including the circuit into it. Certain circuits will be easier or harder to overtake and therefore will effect it. Room for further development.

To leave a comment for the author, please follow the link and comment on their blog: Sport Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)