Getting My Eye In Around F1 Quali Data – Parallel Coordinate Plots, Sort of…

Posted on July 30, 2011 by Tony Hirst in R bloggers | 0 Comments

[This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Looking over the sector times from the qualifying session for tomorrow’s Hungarian Grand Prix, I noticed that Vettel was only fastest in one of the sectors.

Whilst looking for an easy way of shaping an R data frame so that I could plot categorical values sector1, sector2, sector3 on the x-axis, and then a line for each driver showing their time in the sector on the y-axis (I still haven’t worked out how to do that? Any hints? Add them to the comments please…;-), I came across a variant of the parallel coordinate plot hidden away in the lattice package:

What this plot does is for each row (i.e. each driver) take values from separate columns (i.e. times from each sector), normalise them, and then plot lines between the normalised value, one “axis” per column; each row defines a separate category.

The normalisation obviously hides the magnitude of the differences between the time deltas in each sector (the min-max range might be hundredths in one sector, tenths in another), but this plot does show us that there are different groupings of cars – there is clear(?!) white space in the diagram:

Whilst the parallel co-ordinate plot helps identify groupings of cars, and shows where they may have similar performance, it isn’t so good at helping us get an idea of which sector had most impact on the final lap time. For this, I think we need to have a single axis in seconds showing the delta from the fastest time in the sector. That is, we should have a parallel plot where the parallel axes have the same scale, but in terms of sector time, a floating origin (so e.g. the origin for one sector might be 28.6s and for another, 22.4s). For convenience, I’d also like to see the deltas shown on the y-axis, and the categorical ranges arranged on the x-axis (in contrast to the diagrams above, where the different ranges appear along the y-axis).

PS I also wonder to what extent we can identify signatures for the different teams? Eg the fifth and sixth slowest cars in sector 1 have the same signature across all three sectors and come from the same team; and the third and fourth slowest cars in sector 2 have a very similar signature (and again, represent the same team).

Where else might we look for signatures? In the speed traps maybe? Here’s what the parallel plot for the speed traps looks like:

(In this case, max is better = faster speed.)

To combine the views (timings and speed), we might use a formulation of the flavour:

parallel(~data.frame(a$sector1,a$sector2,a$sector3, -a$inter1,-a$inter2,-a$finish,-a$trap))

This is a bit too cluttered to pull much out of though? I wonder if changing the order of parallel axes might help, e.g. by trying to come up with an order than minimises the number of crossed lines?

And if we colour lines by team, can we see any characteristics?

Using a dashed, rather than solid, line makes the chart a little easier to read (more white space). Using a thinking line also helps bring out the colours.

parallel(~data.frame(a$sector1,-a$inter1,-a$inter2,a$sector2,a$sector3, -a$finish,-a$trap),col=a$team,lty=2,lwd=2)

Here’s another ordering of the axes: