Site icon R-bloggers

Figure Aesthetics or Overlays?

[This article was first published on Rstats – OUseful.Info, the blog…, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tinkering with a new chart type over the weekend, I spotted something rather odd in in my F1 track history charts – what look to be outliers in the form of cars that hadn’t been lapped on that lap appearing behind the lap leader of the next lap, on track.

If you count the number of cars on that leadlap, it’s also greater than the number of cars in the race on that lap.

How could that be? Cars being unlapped, perhaps, and so “appearing twice” on a particular leadlap – that is, recording two laptimes between consecutive passes of the start/finish line by the race leader?

My fix for this was to add an “unlap” attribute that detects whether

#Overplot unlaps
lapTimes=ddply(lapTimes,.(leadlap,code),transform,unlap= seq_along(leadlap))

This groups by leadlap an car, and counts 1 for each occurrence. So if the unlap count is greater than 1, a car a has completed more than 1 lap in a given leadlap.

My first thought was to add this as an overprint on the original chart:

#Overprint unlaps
g = g + geom_point(data = lapTimes[lapTimes['unlap']>1,],
                   aes(x = trackdiff, y = leadlap, col=(leadlap-lap)), pch = 0)

This renders as follows:

Whilst it works, as an approach it is inelegant, and had me up in the night pondering the use of overlays rather than aesthetics.

Because we can also view the fact that the car was on its second pass of the start/finish line for a given lead lap as a key property of the car and depict that directly via an aesthetic mapping of that property onto the symbol type:

  g = g + geom_point(aes( x = trackdiff, y = leadlap,
                          col = (lap == leadlap),
                          pch= (unlap==1) ))+scale_shape_identity()

This renders just a single mark on the chart, depicting the diff to the leader *as well as * the unlapping characteristic, rather than the two marks used previously, one for the diff, the second, overprinting, mark to depict the unlapping nature of that mark.

So now I’m wondering – when would it make sense to use multiple marks by overprinting?

Here’s one example where I think it does make sense: where I pass an argument into the chart plotter to highlight a particular driver by infilling a marker with a symbol to identify that driver.

#Drivers of interest passed in using construction: code=list(c("STR","+"),c("RAI","*"))
if (!is.na(code)){
  for (t in code) {
    g = g + geom_point(data = lapTimes[lapTimes['code'] == t[1], ],
                       aes(x = trackdiff, y = leadlap),
                       pch = t[2])
  }
}

In this case, the + symbol is not a property of the car, it is an additional information attribute that I want to add to that car, but not the other cars. That is, it is a property of my interest, not a property of the car itself.


To leave a comment for the author, please follow the link and comment on their blog: Rstats – OUseful.Info, the blog….

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.