Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Prompted by Markku Hänninen, I thought I’d have a quick look at estimating motorsport safety car laps from a set of laptime data. For the uninitiated, if there is a dangerous hazard on track, the race-cars are kept out while the hazard is cleared, but led around by a safety car that limits the pace. No overtaking is allowed for race position, but under certain regulations, lapped cars may unlap themselves. Cars may also pit under the safety car.

Timing sheets typically don’t identify safety car periods, so the question arises: how can we detect them?

One condition that is likely to follow is that the average pace of the laps under safety car conditions will be considerably slower than under racing conditions. A quick way of estimating the race pace is to find the fastest laptime across the whole of the race (or in an online algorithm, the fastest laptime to date).

With a lapTimes dataframe containing columns lap, rawtime (that is, raw laptime in seconds) and position (that is, the race position of the driver recording a particular laptime on a particular lap), we can easily find the fastest lap:

`minl=min(lapTimes['rawtime'])`

We can find the mean laptime per lap using ddply() to group around each lap:

```ddply(lapTimes[c('lap', 'rawtime', 'position')], .(lap), summarise,
mean_laptime=mean(rawtime) )```

We can also generate a variety of other measures. For example, within the grouped ddply operation, if we divide the mean laptime per lap (mean(rawtime)) by the fastest overall laptime we get a normalised mean laptime based on the fastest lap in the race.

```ddply(lapTimes[c('lap','rawtime')], .(lap), summarise,
norm_laptime=mean(rawtime)/minl )```

We might also normalise the leader’s laptime for each lap, on the basis that the leader will be the car most likely to be driving at the safety car’s pace. (The summarising is essentially redundant here because we only have one row per group.

```ddply(lapTimes[lapTimes['position']==1, c('lap','rawtime')], .(lap), summarise,

Using the normalised times, we can identify slow laps. For example, slow laps based on mean laptime. In this case, I am using a heuristic that says the laptime is a slow laptime if the normlised time is more than 1.3 times that of the fastest lap:

```ddply(lapTimes[c('lap','rawtime')], .(lap), summarise,
slow_lap_meanBasis= (mean(rawtime)/minl) > 1.3 )```

If we assume that the first lap does not start under the safety car, we can then make a crude guess that a slow lap not on the first lap is a safety car lap.

However, this does not take into account things like sudden downpours or other changes to the weather or track conditions. In such a case it may be likely that the majority of the field pits, so we might want to have a detector that flags whether a certain number of cars have pitted on a lap, possibly normalised against the current size of the field.

```laps=lapsData.df(2016,4)
pits=pitsData.df(2016,4)[c('lap','driverId')]
pits['pitstop']=T
lapTimes=merge(laps, pits, by=c('lap','driverId'), all.x=T)
lapTimes['pitstop']=!is.na(lapTimes['pitstop'])

#Count of stops per lap
ddply( lapTimes, .(lap), summarise, ps=sum(pitstop==TRUE) )

#Proportion of cars stopping per lap
ddply( lapTimes, .(lap), summarise, ps=sum(pitstop==TRUE)/length(pitstop) )
```

That said, under safety car conditions, many cars do also take the opportunity to pit. However, under sudden changes of weather condition, we might expect nearly all the cars to come in, even if it means doubling up. (So another detector for weather might be two cars in the same team, close to each other in terms of gap, pitting on the same lap, with the result that one will be queued behind the other.)

As and when I get a chance, I’ll try to add some sort of ‘safety car’ estimator to the Wrangling F1 Data With R book.