Analysing open data: Wifi on ICE

[This article was first published on Johannes Friedrich's R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Thank you for travelling with Deutsche Bahn –

I was searching for open data on the internet and found some interesting datasets on the website of the Deutsche Bahn. They provide a huge dataset (~ 2 GB) of Wifi access of their ICE fleet on three days in 2017. This dataset can be found here. The dataset contains coordinates of the trains every 5 seconds and how many users were connected to the router, the number of available satellites and some other indicators.

My goal was to animate the route of all available trails for one day and show the number of available satellites, which will be the size of the dot on the map. The bigger the plotted point the more satellites are available.

I’m interested in trying new things and so I decided to analyse this data set and make some (hopefully) nice visualisations using R and some handy packages: tidyverse, especially dplyr and ggplot2 and some (for me) new packages: gganimate and ggmap. They are wonderful when combining them .. you’ll see!

To see my full code, see my GitHub Gist. The main aspect of this post is the visualisation of data, not the data analysis.

library(tidyverse)
library(lubridate)
library(ggmap)
library(gganimate)
library(animation)

data <- read_delim(file = "surveyor_hackathon_data_20171212.csv", 
                       delim = ";", 
                       escape_double = FALSE, 
                       col_types = cols(
                         sid = col_double(),
                         created = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
                         link_gw_conn = col_logical()), 
                       trim_ws = TRUE)

## remove last line:

data <- data[-nrow(data),]

I had to remove the last line because it just contains “NA”. The file surveyor_hackathon_data_20171212.csv can be found when downloading and unzipping this file.

Now I just select one day (8th March 2017) and order the date in increasing order.

data <- data %>% 
  mutate(
    ICE = as.factor(sid),
    Satellites = sat,
    day = day(created),
    time = strftime(created, format = "%H:%M")
  )

selection <- data %>% 
  filter(day == 8) %>%
  distinct(created, .keep_all = TRUE) %>% 
  arrange(created)

Now to the most interesting parts: First of all, we load a map of interest, in our case a map of Germany. I decided to choose Google as source, but ggmap also offers other providers, as OSM, Stamen Maps or CloudMade.

map <- get_map(location = "Germany",
               zoom = 6,
               source = "google",
               maptype = "roadmap")
               

Next we use ggplot2 and ggmap to create an object p with different layers. This is the great concept of ggplot2. Most interesting is the mapping aesthetic frame. This will be important when combining different plots to an animation with gganimate. In this case it is a kind of loop over the time. I highly recomment the example site of gganimate. The rest of the code is just improving the look of the plot.

p <- ggmap(map) + 
  geom_point(data = selection, 
             mapping = aes(x = gps_laenge, 
                           y = gps_breite, 
                           frame = time, 
                           colour = ICE,
                           size = Satellites,
                           cumulative = TRUE),
             alpha = .05, 
             show.legend = FALSE) +
  scale_radius(range = c(0.2, 6)) +
  labs(x = NULL, 
       y = NULL) +
  theme(axis.text = element_blank(),
        axis.ticks = element_blank()) +
  geom_text(aes(x = 4, 
                y = 55, 
                frame = time, 
                label = paste0("Time: ",time), 
                hjust = "left"), 
            size = 10,
            data = selection)
            

Now we have created the object p and at least we convert it to an animation. For that purpose the function gganimate is used. Important is the argument interval which makes the animation faster (when decreasing) or slower (when increasing). The argument title_frame was set to FALSE because with geom_text() in the code snippet before I included an text element by hand.

gganimate(p = p, 
          filename = "output.gif", 
          interval = 0.1, 
          title_frame = FALSE)

Finally the result: Note that you can change the filename to “some-filename.mp4” to create a mp4 file.

I hope you like the final results and see how easy it is with R to create this kind of animations! Special thanks to the creators and maintainers of the used packages.

To leave a comment for the author, please follow the link and comment on their blog: Johannes Friedrich's R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)