Todd W Schneider analyzed a database of 1.1 billion taxi rides in New York City from 2009-2015, and discovered some interesting insights on how New Yorkers use cabs. For example, here's a map of the drop-off locations of each ride in the database:
The R code to generate this beautiful map is surprisingly simple: just one line to extract the data from a Postgres database, and a few lines of ggplot2 code to render each drop-off as a point on the map, colored by the type of cab (NYC Yellow or regional Green Boro taxis). Note the use of the alpha= argument to make the dots transparent, allowing them to build in intensity according to the number of drop-offs in each location.
Todd also used R to calculate from the data the amount of time required to get from various NYC districts to the airport. For example, here's the chart for trips from midtown Manhattan to JFK airport:
Note how Todd presents probability bands instead of the medians of trip times by time of day. As anyone who communtes regularly knows, the same trip at the same time of day doesn't always take the same amount of time: there is a distribution of possible trip times, from quick runs to extreme delays. If you leave for your destination with only the median trip time (as shown by most navigation apps) to spare, you will be late half the time. Personally, I like to use the 90/90 rule for airport trips: leave at a time that gives me a 90% chance of arriving 90 or more minutes before my flight. This chart helps me follow that rule. For example, at rush hour (around 4PM) you should leave midtown 2 hours and 55 minutes (85 + 90 minutes) before your flight if you want to have 90 minutes at the airport.
For many other charts and analyses of the NYC taxi data, check out Todd's complete blog post below.
Todd W Schneider: Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance