Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps

April 11, 2012
By

(This article was first published on All Things R, and kindly contributed to R-bloggers)

Technologies: SAP HANA, R, HTML5, D3, Google Maps, JQuery and JSON
For this fun exercise, I analyzed more than 200 million data points using SAP HANA and R and then brought in the aggregated results in HTML5 using D3, JSON and Google Maps APIs.  The 2008 airlines data is from the data expo and I have been using this entire data set (123 million rows and 29 columns) for quite sometime. See my other blogs

The results look beautiful:

Each airport icon is clickable and when clicked displays an info-window describing the key stats for the selected airport:
I then used D3 to display the aggregated result set in the modal window (light box):
D3 made it looks ridiculously simpler to generate a table from a JSON file. 
Unfortunately, I can't provide the live example due to the restrictions put in by Google Maps APIs and I am approaching my free API limits. 

Fun fact:  The Atlanta airport was the largest airport in 2008 on many dimensions: Total Flights Departed, Total Miles Flew, Total Destinations.  It also experienced lower average departure delay in 2008 than Chicago O'Hare. I always thought Chicago O'Hare is the largest US airport.

As always, I just needed 6 lines of R code including two lines of code to write data in JSON and CSV files:
################################################################################
airports.2008.hp.summary <- airports.2008.hp[major.airports,     
    list(AvgDepDelay=round(mean(DepDelay, na.rm=TRUE), digits=2),
    TotalMiles=prettyNum(sum(Distance, na.rm=TRUE), big.mark=","),
    TotalFlights=length(Month),
    TotalDestinations=length(unique(Dest)),
    URL=paste("http://www.fly", Origin, ".com",sep="")), 
                    by=list(Origin)][order(-TotalFlights)]
setkey(airports.2008.hp.summary, Origin)
#merge the two data tables
airports.2008.hp.summary <- major.airports[airports.2008.hp.summary, 
                                                     list(Airport=airport, 
                                                          AvgDepDelay, TotalMiles, TotalFlights, TotalDestinations, 
                                                          Address=paste(airport, city, state, sep=", "), 
                                                          Lat=lat, Lng=long, URL)][order(-TotalFlights)]

airports.2008.hp.summary.json <- getRowWiseJson(airports.2008.hp.summary)
writeLines(airports.2008.hp.summary.json, "airports.2008.hp.summary.json")                 
write.csv(airports.2008.hp.summary, "airports.2008.hp.summary.csv", row.names=FALSE)
##############################################################################

Happy Coding and remember the possibilities are endless!

To leave a comment for the author, please follow the link and comment on his blog: All Things R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.