German train monitor provides access to train delay data

[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The German newspaper Süddeutsche Zeitung (SZ) worked together with OpenDataCity to create an online train monitor of the German network: Zugmonitor. This is another great example of the new form of data journalism.

The project provides access to data of train delays collected over 150 days between 2 October 2011 and 1 March 2012 and allows you to analyse the delays in more detail.

Here is an example showing the delays by station.

This SZ article (in German) gives you an overview of the data and how to access it. I believe the most convient method to query the data is to use the Google Fusion tables. It allows you to import the data into R with the read.csv function. The filename to use is an url mixed with a little bit of SQL syntax.

Here is an example extracting the station data (Fusion table 3166152):

The other sources can be accessed in the same way:

DelayFusion table ID
by station3166152
between stations (all trains) 3166064
between stations (ICE tains only) 3166328
by country 3166042
by cause 3165200
by daytime 3164289
by train type 3165124

I am curious what people will make of the data. Apparently more data will be made available in the future. I will keep an eye the project page.


To leave a comment for the author, please follow the link and comment on their blog: mages' blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.