Plotting GTFS data with R

October 23, 2014
By

(This article was first published on Jkunst - R category , and kindly contributed to R-bloggers)

Days ago a study says that Santiago, city where I live, has one of the best
public transport system in LATAM (WAT?! define best please!). So I've search
for some information and I found
this.
Anyway I tried to find some related data/gtfs/information to work/play and I found the
Transantiago GTFS. GTFS means General Transit Feed Specification and is a format for
public transportation schedules and geographic data.

This information comes in a zip file with information about routes, stations
(name, location), shapes (route, path) and other elements in the system. For example
the shape.txt file have the geographic path of each route.

Let's see the files:

library("dplyr")
library("readr")

shapes <- read.csv("data/gtfs/shapes.txt")
head(shapes)
shape_id shapeptlat shapeptlon shapeptsequence
225-I-BASE -33.4 -70.5 0
225-I-BASE -33.4 -70.5 1
225-I-BASE -33.4 -70.5 2
225-I-BASE -33.4 -70.5 3
225-I-BASE -33.4 -70.5 4
225-I-BASE -33.4 -70.5 5

It's simple plot this data with ggplot.

library("ggplot2")
library("ggthemes")


p <- ggplot(shapes) +
  geom_path(aes(shape_pt_lon, shape_pt_lat, group = shape_id),
            size = .1, alpha = .1) +
  coord_equal() +
  theme_map()

p

plot of chunk plot-1

It is a good plot with a few lines of code. But let's get the things more fun:
Transantiago have a subway called Metro, so let's plot with more detail showing the
stations and the routes (lines) over this plot.

We need obtain the stops and routes which belong to Metro. In this case, the stop_id
don't contain a number so we filter the metro's stations with !grepl("\d", stop_id).
Then we need filter the shapes and routes for the metro. At the beggining is a bit complicated,
in fact I needed some time to see the association between all this tables.

routes <- read_csv("data/gtfs/routes.txt")
trips <- read.csv("data/gtfs/trips.txt")
stops <- read.csv("data/gtfs/stops.txt")

stops_metro <- stops %>%
  filter(!grepl("\d", stop_id))

routes_metro <- routes %>%
  filter(grepl("^L\d", route_id))

shapes_metro <- shapes %>%
  filter(shape_id %in% trips$shape_id[trips$route_id %in% routes_metro$route_id]) %>%
  arrange(shape_id, shape_pt_sequence)

Now, get the color for each Metro line.

shapes_colors <- left_join(left_join(shapes %>% select(shape_id) %>% unique(),
                                     trips %>% select(shape_id, route_id) %>% unique(),
                                     by = "shape_id"),
                           routes %>% select(route_id, route_color) %>% unique(),
                           by = "route_id") %>%
  mutate(route_color = paste0("#", route_color))

shapes_colors_metro <- shapes_colors %>%
  filter(shape_id %in% trips$shape_id[trips$route_id %in% routes_metro$route_id]) %>% unique() %>%
  arrange(shape_id)

The data is ready. So it's time to make another plot.

p2 <- ggplot() +
  geom_path(data = shapes,
            aes(shape_pt_lon, shape_pt_lat, group = shape_id),
            color = "white", size = .2, alpha = .05) +
  geom_path(data = shapes_metro,
            aes(shape_pt_lon, shape_pt_lat, group = shape_id, colour = shape_id),
            size = 2, alpha = .7) +
  scale_color_manual(values = shapes_colors_metro$route_color) +
  geom_point(data = stops_metro,
             aes(stop_lon, stop_lat), shape = 21, colour = "white", alpha = .8) +
  coord_equal() +
  theme_map() +
  theme(plot.background = element_rect(fill = "black", colour = "black"),
        title = element_text(hjust = 1, colour = "white", size = 8),
        axis.title.x = element_text(hjust = 0, colour = "white", size = 7),
        legend.position = "none") +
  xlab(sprintf("Joshua Kunst | Jkunst.com %s", format(Sys.Date(), "%Y"))) +
  ggtitle("TRANSANTIAGOnSantiago's public transport system")

p2

plot of chunk plot-2

Or we can just plot only te metro routes with the follow code:

p3 <- ggplot() +
  geom_path(data = shapes_metro,
            aes(shape_pt_lon, shape_pt_lat, group = shape_id, colour = shape_id),
            size = 2, alpha = .8) +
  scale_color_manual(values = shapes_colors_metro$route_color) +
  geom_point(data = stops_metro,
             aes(stop_lon, stop_lat),
             shape = 21, colour = "white", alpha = .8, size = 3) +
  coord_equal() +
  theme_map() +
  theme(plot.background = element_rect(fill = "black", colour = "black"),
        title = element_text(hjust = 1, colour = "white", size = 8),
        legend.position = "none") + 
  xlab(sprintf("Joshua Kunst | Jkunst.com %s", format(Sys.Date(), "%Y")))
p3 + ggtitle("Santiago's METRO")

plot of chunk plot-3

You can see the original image on wikipedia
here.
As you can see, it's simply make a good graphic with a few lines of code. And better,
GTFS is a standard, so you can reuse a big part of this code (and make it a better code!)
to plot transport systems from other cities. If you do it, let me know.

To leave a comment for the author, please follow the link and comment on their blog: Jkunst - R category .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)