One of the things I wanted to explore in the production of the Wrangling F1 Data With R book was the extent to which I could draw on published academic papers for inspiration in exploring the the various results and timing datasets.
In a chapter published earlier this week, I explored the notion of churn, as described in Mizak, D, Neral, J & Stair, A (2007) The adjusted churn: an index of competitive balance for sports leagues based on changes in team standings over time. Economics Bulletin, Vol. 26, No. 3 pp. 1-7, and further appropriated by Berkowitz, J. P., Depken, C. A., & Wilson, D. P. (2011). When going in circles is going backward: Outcome uncertainty in NASCAR. Journal of Sports Economics, 12(3), 253-283.
In a competitive league, churn is defined as:
where is the churn in team standings for year , is the absolute value of the -th team’s change in finishing position going from season to season , and is the number of teams.
The adjusted churn is defined as an indicator with the range 0..1 by dividing the churn, , by the maximum churn, . The value of the maximum churn depends on whether there is an even or odd number of competitors:
Berkowitz et al. reconsidered churn as applied to an individual NASCAR race (that is, at the event level). In this case, is the position of driver at the end of race , is the starting position of driver at the beginning of that race (that is, race ) and is the number of drivers participating in the race. Once again, the authors recognise the utility of normalising the churn value to give an *adjusted churn* in the range 0..1 by dividing through by the maximum churn value.
Using these models, I created churn function of the form:
is.even = function(x) x %% 2 == 0 churnmax=function(N) if (is.even(N)) return(N/2) else return(((N*N)-1)/(2*N)) churn=function(d) sum(d)/length(d) adjchurn = function(d) churn(d)/churnmax(length(d))
and then used it to explore churn in a variety of contexts:
- comparing grid positions vs race classifications across a season (cf. Berkowitz et al.)
- churn in Drivers’ Championship standings over several seasons (cf. Mizak et al.)
- churn in Constructors’ Championship standings over several seasons (cf. Mizak et al.)
For example, in the first case, we can process data from the ergast database as follows:
library(DBI) ergastdb = dbConnect(RSQLite::SQLite(), './ergastdb13.sqlite') q=paste('SELECT round, name, driverRef, code, grid, position, positionText, positionOrder FROM results rs JOIN drivers d JOIN races r ON rs.driverId=d.driverId AND rs.raceId=r.raceId WHERE r.year=2013',sep='') results=dbGetQuery(ergastdb,q) library(plyr) results['delta'] = abs(results['grid']-results['positionOrder']) churn.df = ddply(results[,c('round','name','delta')], .(round,name), summarise, churn = churn(delta), adjchurn = adjchurn(delta) )
For more details, see this first release of the Keeping an Eye on Competitiveness – Tracking Churn chapter of the Wrangling F1 Data With R book.