GeoCoding,R, and The Rolling Stones – Part 2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to Part 2 of the GeoCoding, R, and the Rolling Stones blog. Let’s apply some of the things we learned in Part 1 to a practical real world example.
Mapping the Stones – A Real Example
The Rolling Stones have toured for many years. You can go to Wikipedia and see information on the various tours. Here we focus only on the dates and concerts for the 1975 “Tour of the Americas”. I’ve scraped off the information from the Wikipedia page and put it into a data frame. The idea here is that we will GeoCode each city and obtain a latitude and longitude and then use it to create an interactive map of the tour using the Google Charting Tools.
If you want your own copy of this data frame then do the following:
url = "http://www.bimcore.emory.edu/BIOS560R/DATA.DIR/stones75.csv" stones75 = read.csv(url)
Here are the first 10 rows of the data frame. The format is really simple:
head(stones75,10) Date City State Venue 1 1 June 1975 Baton Rouge Louisiana LSU Assembly Center 2 3 June 1975 San Antonio Texas Convention Center 3 4 June 1975 San Antonio Texas Convention Center 4 6 June 1975 Kansas City Missouri Arrowhead Stadium 5 8 June 1975 Milwaukee Wisconsin County Stadium 6 9 June 1975 Saint Paul Minnesota Civic Center 7 11 June 1975 Boston Massachusetts Boston Garden 8 14 June 1975 Cleveland Ohio Municipal Stadium 9 15 June 1975 Buffalo New York Memorial Auditorium 10 17 June 1975 Toronto Ontario Maple Leaf Gardens
Okay let’s process the cities. Like before we’ll use the sapply command to get back the data after which we’ll use cbind to attach the results to the data frame. We might get some warnings about row names when we do this but don’t worry about it. After all “you can’t always get what you want”.
hold = data.frame(t(sapply(paste(stones75$City,stones75$State,sep=","),myGeo))) stones75 = cbind(stones75,hold) head(stones75,10) Date City State Venue lat lon 1 1 June 1975 Baton Rouge Louisiana LSU Assembly Center 30.5 -91.1 2 3 June 1975 San Antonio Texas Convention Center 29.4 -98.5 3 4 June 1975 San Antonio Texas Convention Center 29.4 -98.5 4 6 June 1975 Kansas City Missouri Arrowhead Stadium 39.1 -94.6 5 8 June 1975 Milwaukee Wisconsin County Stadium 43.0 -87.9 6 9 June 1975 Saint Paul Minnesota Civic Center 45.0 -93.1 7 11 June 1975 Boston Massachusetts Boston Garden 42.4 -71.1 8 14 June 1975 Cleveland Ohio Municipal Stadium 41.5 -81.7 9 15 June 1975 Buffalo New York Memorial Auditorium 42.9 -78.9 10 17 June 1975 Toronto Ontario Maple Leaf Gardens 43.7 -79.4
Great ! So now we have the lat and lon for each city. As you might notice in the data frame the Stones played several nights in the same city so we should probably keep track of this.
stones75[9:18,] Date City State Venue 9 15 June 1975 Buffalo New York Memorial Auditorium 10 17 June 1975 Toronto Ontario Maple Leaf Gardens 11 22 June 1975 New York City New York Madison Square Garden 12 23 June 1975 New York City New York Madison Square Garden 13 24 June 1975 New York City New York Madison Square Garden 14 25 June 1975 New York City New York Madison Square Garden 15 26 June 1975 New York City New York Madison Square Garden 16 27 June 1975 New York City New York Madison Square Garden 17 29 June 1975 Philadelphia Pennsylvania The Spectrum 18 1 July 1975 Largo Maryland Capital Center
As you can see above, they made a six night stand at the famous Madison Square Garden arena in New York City. Our programming should check for duplicate city names before we bug Google to get information that we already have. But that is left as an assignment for you.
Creating a Map of the Tour Using googleVis
Anyway let’s now build a map of the tour dates. For this example we will use a package called “googleVis”. You might not know that Google has a number of mapping services for which R APIs exist. Look at the table at the end of this section, which lists existing packages for interfacing programmatically with the various Google mapping and chart services. You can find these packages on CRAN. In our case we’ll need to install googleVis. After that we can create a map.
install.packages("googleVis",dependencies=TRUE) library(googleVis)
The cool thing about the googleVis package is that we get back a map in a web browser that has scroll bars and zoom tools. Additionally we can use information from the data frame to annotate the chart we plan to create. So, for example, for each tour stop that the band made we can put in meta info like the name of the venue they played as well as the date.
We have to do this in a way that accommodates the requirements of googleVis. This means we have to read through the googleVis manual pages and play around with the examples. However, hopefully I’m presenting a pretty good example here so you don’t have to immerse yourself in the manual (at least not yet).
The first thing we need to do is to create a single column for the Latitude and Longitude because goolgeVis wants this. This is easy to do. Let’s take the existing stones75 data frame and change it:
head(stones75) Date City State Venue lat lon 1 1 June 1975 Baton Rouge Louisiana LSU Assembly Center 30.5 -91.1 2 3 June 1975 San Antonio Texas Convention Center 29.4 -98.5 3 4 June 1975 San Antonio Texas Convention Center 29.4 -98.5 4 6 June 1975 Kansas City Missouri Arrowhead Stadium 39.1 -94.6 5 8 June 1975 Milwaukee Wisconsin County Stadium 43.0 -87.9 6 9 June 1975 Saint Paul Minnesota Civic Center 45.0 -93.1 stones75$LatLon = paste(round(stones75$lat,1),round(stones75$lon,1),sep=":") stones75 = stones75[,-5:-6] # Remove the old lat and lon columns head(stones75) Date City State Venue LatLon 1 1 June 1975 Baton Rouge Louisiana LSU Assembly Center 30.5:-91.1 2 3 June 1975 San Antonio Texas Convention Center 29.4:-98.5 3 4 June 1975 San Antonio Texas Convention Center 29.4:-98.5 4 6 June 1975 Kansas City Missouri Arrowhead Stadium 39.1:-94.6 5 8 June 1975 Milwaukee Wisconsin County Stadium 43:-87.9 6 9 June 1975 Saint Paul Minnesota Civic Center 45:-93.1
Next up we can create a column in our data frame that contains all the information we want to use to annotate each concert date. This can include HTML tags to better format the output. As an example the statement below creates a new column in the data frame called “Tip”, that has the following info: the Stop number on the tour, the Venue where it was held, and the Date of the concert. Once we have a map we can click on the “pin” for each location and see the annotation info.
stones75$Tip = paste(rownames(stones75),stones75$Venue,stones75$Date,"<BR>",sep=" ") # Now we can create a chart ! # Click on the Atlanta locator and you'll see that it was the 37th stop of the tour. # The show took place at The Omni on July 30th, 1975 stones.plot = gvisMap(stones75,"LatLon","Tip") plot(stones.plot)
Refining the Plot Annotations
Pretty cool huh ? We can also zoom in on different parts of the map. The gvisMap function has a number of options that would allow us to draw a line between the cities, select a different type of map, and adopt certain zoom levels by default. So what else could / should we do ?
Well we have a problem here in that the Stones played more than one show in several cities but we don’t take that into account when we are building the annotation data. What we might want to do is to process the data frame and, for those cities that had multiple shows, (e.g. New York), we can capture all the meta data in one go. We saw this before with the New York dates.
stones75[9:18,] Date City State Venue 9 15 June 1975 Buffalo New York Memorial Auditorium 10 17 June 1975 Toronto Ontario Maple Leaf Gardens 11 22 June 1975 New York City New York Madison Square Garden 12 23 June 1975 New York City New York Madison Square Garden 13 24 June 1975 New York City New York Madison Square Garden 14 25 June 1975 New York City New York Madison Square Garden 15 26 June 1975 New York City New York Madison Square Garden 16 27 June 1975 New York City New York Madison Square Garden 17 29 June 1975 Philadelphia Pennsylvania The Spectrum 18 1 July 1975 Largo Maryland Capital Center
Currently our plot has only the last New York show information. But we want to have the info for all NYC shows. Here is one way to approach this problem. Note that there are probably more elegant ways to clean up the data but this will do the job for now.
test = stones75 # Create some temporary work variables str="" tmpdf = list() ii = 1 repeat { # Loop through the copy of the stones75 data frame hold = test[test$Venue == test[ii,4],"Tip"] # Do we have a multi-city stand ? if (length(hold) > 1) { str = paste(hold,collapse="") test[ii,6] = str tmpdf[[ii]] = test[ii,] str="" # We "jump over" cities that we've already processed ii = ii + length(hold) # Here we process the "one night stands" } else { tmpdf[[ii]] = test[ii,] ii = ii + 1 } if (ii > 42) break } tmpdf = tmpdf[!sapply(tmpdf,is.null)] # Remove NULL list elements stones = do.call(rbind,tmpdf) # Bind the list back into a data frame stones.plot = gvisMap(stones,"LatLon","Tip") plot(stones.plot)
Okay. Now depending on your background in R you might think that was a lot of work, (or maybe not). In either case this is fairly typical of what we have to do to clean up and/or consolidate data to get it into a format that is suitable for use with the package we are using. Don’t think that this type of effort is peculiar to googleVis because other packages would require a comparable level of processing also. Welcome to the real world of data manipulation.
Anyway let’s take a look at the new plot. At first cut it seems just like the old one but click on the New York locator and you will now see that all the info for all Madison Square Garden is present. Shows number 11 through 16 took place in NYC.
R packages for interfacing with Google
Here is a table that lists the other R packages that exist to interface with various Google services. Each one of these is worth investigation. Keep in mind that similar accommodations exist for other languages so if you prefer to do your coding in Perl or Python then you could work with the Google APIs also.
PACKAGE | DESCRIPTION |
googleVis | Create web pages with interactive charts based on R data frames |
plotGoogleMaps | Plot HTML output with Google Maps API and your own data |
RgoogleMaps | Overlays on Google map tiles in R |
animation | A Gallery of Animations in Statistics and Utilities |
gridSVG | Export grid graphics as SVG |
SVGAnnotation | Tools for Post-Processing SVG Plots Created in R |
RSVGTipsDevice | An R SVG graphics device with dynamic tips and hyperlink |
iWebPlots | Interactive web-based plots |
Filed under: GeoCoding XML processing Tagged: Education, software, technology
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.