NYC Citi Bike Visualization – A Hint of Future Transportation

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Like all other sharing systems,  Airbnb the housing sharing system, Uber the car sharing system, Citi Bike is the network of bicycle rental stations intended for point-to-point transportation.

Citi Bike is New York City’s largest bike sharing system. It’s a convenient solution for trips that are too far to walk but too short for a taxi or the subway. The bike sharing system is combined with all other transportation methods available in the area for commuters.

Currently, there are about a million trips on average per month by Citi Bike riders. The system has 10,000 bicycles and 610 stations. By end of 2017, the total size of Citi Bike system will be 12,000 bikes and 750 stations. The grey area is the current service area. The yellow and blue areas represent the sections to be covered by end of 2017.



The Optimization Questions

Any Citi Bike client has come up against two frustrating scenarios: the empty dock at the start and full dock at the end of the trip. Researchers call this as “rebalancing” problem as part of “fleet optimization” questions.  This problem has attracted the attention of data scientists to develop complex methodologies to optimize the available bikes and open docks.

Following I attempt to utilize the shiny visualization app to provide a hint for the 3 questions:

  1. Fleet Routing Pattern Detection: what are the most popular routes during peak hours and off-peak? What is the direction of the flow?
  2. Station Balance Prediction: what is the average volume of imbalance in the distributed system? What is the station-level inflow and outflow? Is it sensitive to the time? How does it look like in a time series?
  3. Reducing rebalancing demand: What are the riders’ activities like? Is it possible to rebalance through pricing schemes?

The visualization app is intended to provide a way to explore different comparative measures at the route, station and system levels with spatial attributes and time series.


The Data

Citi published Citi Bike Trip Histories – downloadable files of Citi Bike trip data. I used the Citi Bike data for the month of March 2017 (approximately 1 million observations). The data includes:

  • Trip Duration (in seconds)
  • Start Time and Date
  • Stop Time and Date
  • Start Station Name
  • End Station Name
  • Station ID
  • Station Lat/Long
  • Bike ID
  • User Type (Customer = 24-hour pass or 7-day pass user; Subscriber = Annual Member)
  • Gender (Zero=unknown; 1=male; 2=female)
  • Year of Birth

Before moving ahead with building the app, I was interested in exploring the data and identifying patterns of rebalancing.

Bar Chart 1 – Time wise imbalance (Peak/Off peak)

Bar Chart 2 -Location wise imbalance (Top 10 popular Station)



On the interactive map, each dot presents a station.  The visualization will also provide options to identify popular routes by selecting date and hour range. The top popular routes are marked in orange as the lines between the spatial points. The direction of the routes is indicated by moving from the more red towards the more green dots.

Citi Bike Daily Migration

The Patterns

Interesting patterns are observed. The most popular routes on the west side run through Central Park and Chelsea Pier. Grand central/Penn Station centered routes are also in the hottest route list. Outside Manhattan there are centers in Queen and Brooklyn initiating lots of popular routes. Riders bike more along the west and east streets than along north and south avenue. That makes sense in light of the fact that  there are more uptown and downtown subways than crosstown ones, and riders do utilize the Citi Bike as a an alternative transportation option.

While not enough bikes available in hot pick up stations, the docks are lacking in hot drop off stations. The red dots are where outflow of bikes exceeds the inflow of bikes The green dots are where inflow of bikes exceeds the outflow of bikes. In the other words, the green dots are the hot spot to pick up a bike(more inbound bikes) and the red(more empty docks) to drop them off. And The more extreme the color of dot is, the higher percentage change of the flows this stations has. The size and transparency of the dot is represented by the volume of  both inflow and outflow of the stations. The more obvious the dot is, the hotter spot the station is.

What caused the balancing problem? The map based interactive app provides an insight for predicting demand. The information displayed is the accumulated hourly variables based on dates selected. Details of statistic numbers is also available for each stations by zooming in.

New York has a classic peak commuter flow pattern.  Most commuters ride bikes towards the center of the town from its edge in the morning. At the end of the day, they ride the reverse way to the edge where they live, especially at the edge sections with fewer public transportations options.   


The Rider’s Activities

What about the rider’s activities. Is there any pattern involved? The app provides insights of rider’s performance for reducing rebalancing demands. By studying rider’s activities, it will provides suggestions for potential solutions.

Below each bubble represents an age and gender group. The age is represented as the number on each bubble.  A negative correlation is observed between age and speed. The younger the rider is, the faster he/she rides. In similarity, the group in the thirties shows similar miles per trip. The performance between female and male group are also different.  The male groups in blue perform a higher speed level than female groups in red. 


The Balancing Solutions

Is there solutions for rebalancing to cut the cost and improve the efficiency, instead of manually moving bikes via trucks, bike-draw trailers and sprinter vans from full stations to empty stations?  The moving will take crews travel in pairs 45 minutes to load a truck. 

Citi Bike sought a way to get the riders to move the bikes themselves. In May it started the pilot Bike Angel. The reverse-commuter would be perfect target member of the Angel program. What is so appealing about the program is the bike sharing system could self -distribute its fleet with the proper incentives. The member can easily make 10 Amazon gift card with a few reverse trips. As a result, the demand of manually moving bike around would decrease.


The Conclusions

The Visualization app provides the real time status of fleets: popular routes, inbound/outbound, net change, time series of stations, hot spot analysis and rider’s activities. It supports the self distributed fleet by establishing a baseline for identifying “healthy” rebalancing within the bike share system. It provides a hint for a future transportation solutions.


The App

The interactive app is available on

The post NYC Citi Bike Visualization – A Hint of Future Transportation appeared first on NYC Data Science Academy Blog.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)