It is hard to wander around New York City without seeing rows of dozens of bright blue Citibikes planted in the middle of busiest nooks and crannies of the city. These bikes belong to Citibike, a ride-sharing program that allows users to conveniently rent a bike to travel to their destinations without having to worry about the hassles of parking and locking their bicycle. Citibike has quickly become the preferred mode of transportation for many New Yorkers who are tired of the laundry list of issues with public transportation and are looking to get some fresh air as they travel around the city.
The premise for the program is quite simple, you can choose between an annual pass for year round access or a 3 or 7 day pass as a more temporary option. Pass holders are able to pick up a bike from a station near them and ride to their destination and park the bike at a station closest to their final destination. With over 622 stations in Manhattan, Brooklyn, and Queens, New Yorkers can easily find a convenient location nearby.
As a New Yorker who sees Citibike stations in what seems like just about every other corner, I came across the interesting realization that Citibike has a great opportunity to sell advertising space at the Citibike stations that are located in prime locations with mass exposure to pedestrian and vehicular traffic. For my project, I built an app that organizes existing Citibike user data to help guide sponsors looking to purchase advertising space with Citibike.
Citibike’s public data set is a fantastic source of information because every single ride is documented and released to its data set. Across the 12 month span from Aug 2016 to July 2017, there were over 15 million observations for citibike rides. Due to timing and the limitations with R, I will be focusing on 1.5 million observations for the month of July in 2017 (The newest month released by Citibike). Citibike’s raw data set includes the following categories:
- Trip Duration (seconds)
- Start Time and Date
- Stop Time and Date
- Start Station Name
- End Station Name
- Station ID
- Station Lat/Long
- Bike ID
- User Type (Customer = 24-hour pass or 3-day pass user; Subscriber = Annual Member)
- Gender (Zero=unknown; 1=male; 2=female)
- Year of Birth
For the sake of my project which attempts to better understand and observe trends in the Citibike customer, I will be focusing on the following categories:
- Station Locations
- Number of Visits To Each Station(Both Start and Finish)
- Date and Time of Citibike Ridership
Exploration And Visualization
Left: Gender Breakout Chart Right: Age Group Breakout Chart
Upon simple analysis of Citibike users, you can find a couple of interesting facts. First, Male riders outnumber female riders by 2:1 ratio. The cause of the discrepancy may be attributed to male vs female transportation preferences but can not be accurately determined from the scope of this project. Next, you will notice that two age groups (25 to 30 and 31 to 36) account for just under 50% of all Citibike riders. This information is very telling in terms of the popularity of Citibike among young professionals. Understanding the population of Citibike users is extremely important in considering advertising opportunities with Citibike.
Shiny App Layout
The premise of the app that I have built is for potential sponsors looking to buy advertising space with Citibike to use citibike’s user data to better strategize where and when to invest in advertising. As a result, the Shiny app allows the user to interact with the app to gain valuable information. My app consists of three tabs outlined below:
1. Gender Map
In this tab, the user(potential sponsor) is able to select the gender they would like to filter by in the top left box. Once the user selects a filter, the map on the right automatically adjusts to show density of citibike users by the selected criteria. For example, if “Female” button is selected, the map will show the neighborhoods in NYC(represented by the colored polygons ) by how many female riders are active, the darker colors representing higher density of female riders. In addition, ten markers are shown on the map representing the location of the Top 10 Most VIsited Bike Stations based on the chosen gender.
Top: Gender Map (Both Males and Females) Bottom: Top Ten Stations(Both Genders)
2. Age Range Map
The second tap in the app, mirrors the functionality of the Gender Map tab, only now the user is able to visualize New York City according to age range. Once again, the user selects the age range he/she would like to study and the map shows the density of Citibike riders in the selected age range in each neighborhood. Top 10 Most Visited Citibike Stations are displayed for each age range through markers.
Top: Age Range Map(All Ages) Bottom: Top Ten Stations( All Genders)
3. Date & Time
In the final tab, the user is able to select both gender as well as the age range he/she would like to examine. The charts below output the hourly activity of the selected gender/age range combination as well as activity by day of the week.
Example Hourly and Weekday ridership visualization in the “Time and Date” Tab
Case Study Example
In order to better understand a real world application of the app, it is useful to examine a sample case study. :
What if a up-and-coming makeup brand X is looking to target younger female customers wanted to advertise with Citibike? Using the gender map, company X can see which stations/ neighborhoods have the most exposure to female riders. Similarly, selecting the 19-24 age range in the Age Range tab will generate a map showing which stations/neighborhoods will have the most exposure to 19-24 year old Citibike users. Finally, the Date and Time Tab can be filtered to show the time of day and day of the week female riders are most active! Using the Top 10 stations for females and the 19-24 age group, we can deduce the top 3 stations that Company X should target.
Top: Map Filtered For Females Right: Map Filtered For 18 to 25 Age Group
Hour and Day Activity of Female Riders in 18-25 year old range
Left: Top 10 Stations: Females Right Top 10 Stations 18 to 25 Age Group
1)West St & Chambers St
2) Broadway & E 22nd St
3) Broadway & E 14th St
Future Application for the App
As shown in the case study example, the app takes Citibike ridership data and organizes the information in a way that can be extremely useful for advertising. Although the app itself, was a little specific in scope(limited to Citibike users and NYC region), the application is a great example of using data sets to glean information regarding customer behavior. With simple analysis and visualization techniques, the app is able to hone in on locations and times that will provide the most advantageous for marketing purposes. As the technology for acquiring customer-centric data becomes more and more powerful( i.e. credit card/ shopping cart analysis, viewership analysis, email subscription analysis, etc.), it will become extremely important for companies to take massive amounts of data and translate into useful contributions to business strategy and innovation.