A look at the Bay Area Bike Share

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Bicycles are one of the best ways to travel in an urban setting.  Because of its zero-emission and active lifestyle promoting reputation, it has increasingly become mainstream in cities across the world.  City planners have moved to make their streets more bike-friendly, and several private sector bike-sharing programs have come forward to make it easier for commuters and tourists to share bikes. For this project, I looked at a dataset made public by a bike-sharing program based in San Francisco, CA.

The dataset:
The data for the project came from the Kaggle website and can be found here. There are four .csv files, of which I have used two: ‘station.csv’ and ‘trip.csv’. The dataset chronicles every bike trip undertaken using the program over two years, from August 2013 to August 2015.

I wrote a R Shiny app to summarize the findings of my exploration. The shiny app link for the project can be found here. My code to implement this project can be found on my github.

Description of the app:
The dataset contains information about 70 bike stations scattered around five cities (namely San Francisco, San Jose, Redwood city, Mountain view, and Palo Alto) in the bay area.

In the ‘Stations’ tab, the app presents a map showing the location of the bike stations. There is also a bar plot showing number of bicycle docks in individual stations. The user can select one or more cities, and stations located at those cities will be shown. The radius of a circle, which represents a station, is proportional to the number of docks.

In the ‘Trip frquency’ tab, again the stations are shown, but this time the radius is proportional to trip frequency originating from a station. Users may filter this by choosing a city, hour of the day the trip originated or a date range.

In the ‘Station connection’ tab, users may select a station where a trip originates, and the map on the left will show all the stations where the trip ended. Larger circle again means larger frequency. On the right side, the trip frequency is shown in a bar plot.


The ‘Trip duration’ tab shows the trip duration in different seasons (left) and seasonal variation of trip frequency. The right plot also shows the split between two different subscription types. The program has a system of subscription where for a monthly fee, a subscriber can ride a bike for free for 30 minutes. By playing with the two slidebars on this page, it can be seen that those subscribers are more likely to ride the bikes for a shorter period of time, and during the peak morning and evening hours, possibly to commute to and from work. There is another group of users who rent the bikes on an hourly or daily basis. This group dominates when the bike ride is longer than 30 minutes. It is also clear that there is a general decline in bike riding during the winter months despite of Bay area’s comparatively mild winter.

The tab ‘Variation of trip frequency’ shows how trip frequency varied over the two years. Some of the fluctuation on a day-to-day basis is clearly statistical or may be due to weather, which will be interesting to explore. Not surprisingly, number of trips is larger on the weekdays than on the weekends. There is a steep decline in bike riding during the Christmas holidays. A bar plot of trip frequency over the hours of the day gives us some more insight. It confirms the bimodal nature of bike-riding during the weekdays, the two peaks corresponding to the morning and the evening peak hours. This tells us, on a weekday, most bike rides are by commuters . On the weekends, trip frequency stays pretty flat throughout the day.



As an exploration tool, I found Shiny to be quite useful. Its interactive nature  makes it very easy to vary the different filters on the dataset. Sometimes it leads to unexpected insights. For example, the behavior of bike renters for the two different subscription models were strikingly different.

Future work:

The complete dataset contains files that I have not explored. In particular, it will be interesting to see how weather information can be used to predict general user behavior.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)