In my new book, “Mastering Machine Learning with R”, I wanted to include geo-spatial mapping in the chapter on cluster analysis. I actually completed the entire chapter doing a cluster analysis on the Iraq Wikileaks data, plotting the clusters on a map and building a story around developing an intelligence estimate for the Al-Doura Oil Refinery, which I visited on many occasions during my 2009 “sabbatical”. However, the publisher convinced me that the material was too sensitive for such a book and I totally re-wrote the analysis with a different data set. I may or may not publish it on this blog at some point, but I want to continue to explore building maps in R. As luck would have it, I stumbled into a data set showing the locations of Russian airstrikes in Syria at the following site:
The data includes the latitude and longitude of the strikes along with other background information. The what, how and why the data was collected is available here:
In short, the site tried to independently verify locations, targets etc., plus includes what they claim are the reported versus actual strike locations. When I pulled the data there were 60 strikes analyzed by the site. They were unable to determine the locations of 11 of the strikes, so we have 49 data points.
I built the data in excel and put in a .csv, which I’ve already loaded. Here is the structure of the data.
Since lat and long are character, I need to change them to numeric and also keep a subset of data of the actual/real strike locations.
I will be using ggmap for this effort and pull in google maps for plotting.
The first map will be an overall view of the country with the map type as “terrain”. Note that “satellite”, “hybrid” and “roadmap” are also available.
With the map created as object “map1”, I plot the locations using “geom_point()”.
With the exception of what looks like one strike near Ar Raqqah, we can see they are concentrated between Aleppo and Homs with some close to the Turkish border. Let’s have a closer look at that region.
East of Ghamam is a large concentration, so let’s zoom in on that area and add the strike number as labels.
The last thing I want to do is focus in on the site for Strike 28. To do this we will require the lat and long, which we can find with the which() function.
It is now just a simple matter of using those coordinates for calling up the google map.
> map4 = ggmap(
get_googlemap(center=c(lon=36.11946,lat=35.68449), zoom=17, maptype=”satellite”))
From the looks of it, this seems to be an isolated location, so it was probably some sort of base or logistics center. If you’re interested, the Russian Ministry of Defense posts videos of these strikes and you can see this one on YouTube.
OK, so that is a quick tutorial on using ggmap, a very powerful package. We’ve just scratched the surface of what it can do. I will continue to monitor the site for additional data. Perhaps publish a Shiny app if the data is large and “rich” enough.