Tracking SFO Airport’s Performance Using R, HANA and D3

March 22, 2012
By

(This article was first published on All Things R, and kindly contributed to R-bloggers)

This is my first introduction to D3 and I am simply blown away.  Mike Bostock (@mbostock), you are genius and thanks for creating D3!  With HANA, R, D3, HTML5 and iPad, and you got yourself a KILLER combo!

I have been burning my midnight oil on piecing together my big data story using HANA, R, JSON and HTML5.  If you recall, I did a technical session on R and SAP HANA at DKOM, SAP's Development Kickoff Event last week where I showcased the supreme powers of R and HANA when analyzing 124 million records in real time.  R and SAP HANA: A Highly Potent Combo for Real Time Analytics on Big Data

Since last week, I have been looking for other creative ways to analyze and then visualize this airlines data. I am very fortunate to come across D3.  After spending couple of hours with D3, I decided to build the calendar view for the airlines data I have.  The calendar view is the first example Mike shows on his D3 page. Amazingly awesome!

I created this calendar view capturing the percent of delayed flight from SFO airports that departed daily between 2005-2008.  For this analysis, I used HANA to get the data out for SFO (out of 250 plus airports) over this 4 years period in seconds and then did all the aggregation in R including creating a JSON and .CSV file in seconds again.  Later, I moved to HTML5 and D3 to generate this beautiful calendar view showing SFO's performance.  Graphics is presented below:


As expected, December and January are two notorious months for flights delay.  Have fun with the live example hosted in the Amazon cloud.


Once again, my R code is very simple:

## Depature Delay for SF Airport
ba.hp.sfo <- ba.hp[Origin=="SFO",]

ba.hp.sfo.daily.flights <- ba.hp.sfo[,list(DailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
ba.hp.sfo.daily.flights.delayed <- ba.hp.sfo[DepDelay>15,list(DelayedDailyFlights=length(DepDelay)), by=list(Year, Month, DayofMonth)][order(Year,Month,DayofMonth)]
setkey(ba.hp.sfo.daily.flights.delayed, Year, Month, DayofMonth)
response <- ba.hp.sfo.daily.flights.delayed[ba.hp.sfo.daily.flights]
response <- response[,list(Date=as.Date(paste(Year, Month, DayofMonth, sep="-"),"%Y-%m-%d"),
                           #DailyFlights,DelayedDailyFlights,
                           PercentDelayedFlights=round((DelayedDailyFlights/DailyFlights), digits=2))]
objs <- apply(response, 1, toJSON)
res <- paste('{"dailyFlightStats": [', paste(objs, collapse=', '), ']}')
writeLines(res, "dailyFlightStatsForSFO.json")                
write.csv(response, "dailyFlightStatsForSFO.csv", row.names=FALSE)

For D3 and HTML code, please take a look at this example from D3 website.

Happy Analyzing and Keep That Mid Night Oil Burning!

To leave a comment for the author, please follow the link and comment on his blog: All Things R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.