I wanted to create a quick visualization of Bloomington IL bus stops. This data is in pdf file format spread across multiple files. The first step, before any mapping can occur, is downloading those files, parsing them to get the bus stop locations and times.
First, I need to get a list of all of the files. This was a little complicated by the fact that the URL for the buses didn’t play nice with some of the usual html R tools (RCurl). Alas, the httr package was the solution. First get the html dump, then look for a table with the id of fsvItemsTable and make your way down the tree to get the hrefs for all of the files. I imagine there’s a way to avoid the grep at the end of this snippet, but it works, so I stopped…
Next, use this list of links to download the files. Again, the normal way of doing things, download.file(), failed, but downloader::download() did work.
Now that we have a directory filled with pdf files, what do we do with it? Well, there’s a function called readPDF() in the tm package that can be used to read the data in a pdf file. And using code ripped straight from stack overflow, it was pretty easy to get the data.
This leaves you with a single string for each row of data in the pdf table. A little grepping will separate the data in to separate columns in a data frame. See the full code linked at the bottom of the page for these details.
Now we must geocode the bus stop locations so we can plot them on a map. For this, the ggmap package has a simple function called geocode().
At the end of all of this, we finally have a data set to map. Here’s what it looks like…
…well, not exactly. To use the toGeoJSON() function in the rCharts package, the df must be transformed into a list. Also, I add in a color for each route so we can tell them apart on the map, and format the text for the tooltip for each point.
Again, in keeping with using other people’s code, I reused some code that Ramnath Vaidyanathan had done for the foodborne chicago map a while back to create a leaflet map of the bus stops. He is the author of the rCharts package, is super helpful via twitter and github with random issues, and is doing a tutorial at useR_2014 in LA. I can’t wait to meet him… The last part of this code snippet creates a github gist out of it. I had some trouble using it on my network, so I just used the .save() method to create an html file and copy-pasted it as a gist.
And here’s the result. There’s still some work to be done on the geocoding end of things. As you can see if you click on a dot on the map, the location doesn’t always line up with where the map tooltip says it should be.
All of the code can be found on github.