rMaps Mexico map

[This article was first published on Fellgernon Bit - rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It’s exciting when great people help each other get things done

This is a simple networking story, which not be surprising to some but I was happily surprised by it. This is how the story goes:

Two weeks ago rMaps (Vaidyanathan, 2014) was released. After making a blog post about it I thought about using it to make a map of the homicide rate in Mexico over the recent years. First, I had the question of how to make custom maps with rMaps. @tyokota had the same question and started asking Ramnath about it in rMaps issue 6. Then I realized I needed a specific file with the map information. Google lead me to @diegovalle who has created the map from official Mexican sources, downloaded the homicide data, cleaned it, and made several maps and analyses: all his work is very impressive! I thought that it’d be very cool if @diegovalle and Ramnath connected, and they did! I saw them interacting via Twitter (here and here) and via GitHub. After sharing @diegovalle‘s work with my friends, it turned out that some old friends already knew him (here and high school friends). Another friend was interested in additional features and I suggested her to contact @diegovalle via Twitter: he quickly replied as you can see here.

Beyond how impressive rMaps and @diegovalle‘s work on mexican data are, I was amazed by the willingness to help each other and how great people easily connected. I believe this is one of the great features of both GitHub and Twitter where you can share your code, ask questions, try to answer them, meet people working with your tools, etc. You can even offer to PayPal a beer like @tyokota did.

After all their great work, now someone like me (aka, without knowing javascript, Datamaps, etc) can walk you through an example of making an interactive choropleth map showing the homicides rates in Mexico from 1997 to 2013.

Homicides rates in Mexico 1997-2013

The first thing we need to make a custom map using rMaps is a topojson file which in this case specifies the mexican states boundaries. This process is explained in more detail by @tyokota at custom-map which you can view here.

In this particular example, INEGI which is the National Institute of Statistics and Geography of Mexico has a map of the mexican states. @diegovalle explained how to download it here.

But before doing so, you might to install topojson like I did below following the installation instructions. In the terminal:

<span class="c">## Install node.js following instructions at https://github.com/mbostock/topojson/wiki/Installation</span>
brew install node
<span class="c">## Install topojson</span>
npm install -g topojson

<span class="c">## Download map info from INEGI (Mexican official source)</span>
curl -o estados.zip http://mapserver.inegi.org.mx/MGN/mge2010v5_0.zip
<span class="c">## Decompress file</span>
unzip estados.zip
<span class="c">## Create shapefile</span>
ogr2ogr states.shp Entidades_2010_5.shp -t_srs <span class="s2">"+proj=longlat +ellps=WGS84 +no_defs +towgs84=0,0,0"</span>
<span class="c">## id-property needed so that DataMaps knows how to color the map</span>
topojson -o mx_states.json -s 1e-7 -q 1e5 states.shp -p <span class="nv">state_code</span><span class="o">=</span>+CVE_ENT,name<span class="o">=</span>NOM_ENT --id-property NOM_ENT

Now that we have the topojson file mx_states.json we need to get the actual homicide data. @diegovalle has gone through the whole process of acquiring the data from official mexican sources and cleaning it. Lets download it.

<span class="c1"># Download crime data</span>
<span class="c1">## From crimenmexico.diegovalle.net/en/csv</span>
<span class="c1">## All local crimes at the state level</span>
download.file<span class="p">(</span><span class="s">"http://crimenmexico.diegovalle.net/en/csv/fuero-comun-estados.csv.gz"</span><span class="p">,</span> 
    <span class="s">"fuero-comun-estados.csv.gz"</span><span class="p">)</span>

The data is not completely ready for us to use it and we need to reshape it a bit. In particular, we want to consider only the intentional homicides and group the data by state and date. We can get this to work by using dplyr (Wickham & Francois, 2014).

<span class="c1">## Load required packages</span>
library<span class="p">(</span><span class="s">"dplyr"</span><span class="p">)</span>

<span class="c1">## Load the crime data</span>
crime <span class="o"><-</span> read.csv<span class="p">(</span><span class="s">"fuero-comun-estados.csv.gz"</span><span class="p">)</span>

<span class="c1">## Only intentional homicides</span>
crime <span class="o"><-</span> subset<span class="p">(</span>crime<span class="p">,</span> crime <span class="o">==</span> <span class="s">"HOMICIDIOS"</span> <span class="o">&</span> type <span class="o">==</span> <span class="s">"DOLOSOS"</span><span class="p">)</span>

<span class="c1">## Sum homicides by firearm, etc and group by state and date</span>
hom <span class="o"><-</span> crime <span class="o">%.%</span>
  filter<span class="p">(</span>year <span class="o">%in%</span> <span class="m">1997</span><span class="o">:</span><span class="m">2013</span><span class="p">)</span> <span class="o">%.%</span>
  group_by<span class="p">(</span>state_code<span class="p">,</span> year<span class="p">,</span> type<span class="p">)</span> <span class="o">%.%</span>
  summarise<span class="p">(</span>total <span class="o">=</span> sum<span class="p">(</span>count<span class="p">,</span> na.rm <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span>
            population <span class="o">=</span> mean<span class="p">(</span>population<span class="p">)</span> <span class="p">)</span> <span class="o">%.%</span>
  mutate<span class="p">(</span>rate <span class="o">=</span> total <span class="o">/</span> population <span class="o">*</span> <span class="m">10</span><span class="o">^</span><span class="m">5</span><span class="p">)</span> <span class="o">%.%</span>
  arrange<span class="p">(</span>state_code<span class="p">,</span> year<span class="p">)</span>

<span class="c1">## How are states coded?</span>
summary<span class="p">(</span>hom<span class="o">$</span>state_code<span class="p">)</span>
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    8.75   16.50   16.50   24.20   32.00

We have the slight inconvenience that states are coded as integers from 1 to 32 instead of using their names. Using another of the files supplied by @diegovalle we can merge the codes. This requires using the foreign (R Core Team) package for loading a dbf file and then merging both data sets with plyr (Wickham, 2011).

<span class="c1">## Needed for read.dbf</span>
library<span class="p">(</span><span class="s">"foreign"</span><span class="p">)</span>

<span class="c1">## The dbf from the state shapefile needed to merge state_code with state</span>
<span class="c1">## names</span>
codes <span class="o"><-</span> read.dbf<span class="p">(</span><span class="s">"states.dbf"</span><span class="p">)</span>
codes<span class="o">$</span>NOM_ENT <span class="o"><-</span> iconv<span class="p">(</span>codes<span class="o">$</span>NOM_ENT<span class="p">,</span> <span class="s">"windows-1252"</span><span class="p">,</span> <span class="s">"utf-8"</span><span class="p">)</span>
codes<span class="o">$</span>CVE_ENT <span class="o"><-</span> as.numeric<span class="p">(</span>codes<span class="o">$</span>CVE_ENT<span class="p">)</span>
codes<span class="o">$</span>OID <span class="o"><-</span> <span class="kc">NULL</span>
names<span class="p">(</span>codes<span class="p">)</span> <span class="o"><-</span> c<span class="p">(</span><span class="s">"state_code"</span><span class="p">,</span> <span class="s">"name"</span><span class="p">)</span>

<span class="c1">## Load plyr for join(). Loading it before creates a problem with the dplyr</span>
<span class="c1">## call above</span>
library<span class="p">(</span><span class="s">"plyr"</span><span class="p">)</span>

<span class="c1">## Names needed for the map</span>
hom <span class="o"><-</span> join<span class="p">(</span>hom<span class="p">,</span> codes<span class="p">)</span>

<span class="c1">## Lets look at the data</span>
head<span class="p">(</span>hom<span class="p">)</span>
##   state_code year    type total population   rate           name
## 1          1 1997 DOLOSOS   355     958126 37.051 Aguascalientes
## 2          1 1998 DOLOSOS    66     975585  6.765 Aguascalientes
## 3          1 1999 DOLOSOS    27     992515  2.720 Aguascalientes
## 4          1 2000 DOLOSOS    14    1009215  1.387 Aguascalientes
## 5          1 2001 DOLOSOS    22    1026437  2.143 Aguascalientes
## 6          1 2002 DOLOSOS    26    1044578  2.489 Aguascalientes

Great! We now have state names under name and the intentional homicide rate under rate (in homicides per 100,000) for each specific year. We can thus proceed to making the interactive choropleth map using the ichoropleth function described by Ramnath here. This requires specifying the topojson file which is specified via dataUrl, the name of the map specified via scope and the most tricky part (for me at least) is that we need to specify the setProjection. These are all properties of the Datamaps library. In particular, the wiki describes how to use custom maps but this requires some javascript knowledge.

<span class="c1">## Make the map</span>
library<span class="p">(</span><span class="s">"rMaps"</span><span class="p">)</span>
d1 <span class="o"><-</span> ichoropleth<span class="p">(</span>rate <span class="o">~</span> name<span class="p">,</span> data <span class="o">=</span> hom<span class="p">,</span> ncuts <span class="o">=</span> <span class="m">9</span><span class="p">,</span> pal <span class="o">=</span> <span class="s">'YlOrRd'</span><span class="p">,</span> 
    animate <span class="o">=</span> <span class="s">'year'</span><span class="p">,</span>  map <span class="o">=</span> <span class="s">'states'</span>
<span class="p">)</span>
<span class="c1">## Note that I am hosting the mx_states.json in Dropbox</span>
<span class="c1">## but if you are doing it locally, you only need</span>
<span class="c1">## dataUrl = "mx_states.json"</span>
d1<span class="o">$</span>set<span class="p">(</span>
  geographyConfig <span class="o">=</span> list<span class="p">(</span>
   dataUrl <span class="o">=</span> <span class="s">"https://dl.dropboxusercontent.com/u/10794332/mx_states.json"</span>
  <span class="p">),</span>
 scope <span class="o">=</span> <span class="s">'states'</span><span class="p">,</span>
 setProjection <span class="o">=</span> <span class="s">'#! function( element, options ) {</span>
<span class="s">   var projection, path;</span>
<span class="s">   projection = d3.geo.mercator()</span>
<span class="s">    .center([-89, 21]).scale(element.offsetWidth)</span>
<span class="s">    .translate([element.offsetWidth / 2, element.offsetHeight / 2]);</span>

<span class="s">   path = d3.geo.path().projection( projection );</span>
<span class="s">   return {path: path, projection: projection};</span>
<span class="s">  } !#'</span>
<span class="p">)</span>
d1<span class="o">$</span>save<span class="p">(</span><span class="s">'rMaps.html'</span><span class="p">,</span> cdn <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>

The end result is shown below:

You can also share the map using the publish method as shown below:

d1<span class="o">$</span>publish<span class="p">(</span><span class="s">"Intentional homicides rates for Mexico 1997-2013"</span><span class="p">)</span>
<span class="c1">## You'll need a GitHub account</span>

You will get a link to the rCharts viewer such as this one or if you prefer, you can also view the result using Pagist as shown here.

The code presented in this post was written by @diegovalle which can you view here and Ramnath which is shown here. I also figured out the trick of hosting the topojson file at Dropbox from @tyokota‘s code as I was running into Access-Control-Allow-Origin errors when hosting it in my academic website. Finally, but not least, Ramnath appropriately insists that all of this would not be possible without libraries such as Datamaps.


Citations made with knitcitations (Boettiger, 2014).


sessionInfo<span class="p">()</span>
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## other attached packages:
## [1] rMaps_0.1.1         plyr_1.8            foreign_0.8-59     
## [4] dplyr_0.1.1         knitcitations_0.5-0 bibtex_0.3-6       
## [7] knitr_1.5          
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1     digest_0.6.4       evaluate_0.5.1    
##  [4] formatR_0.10       grid_3.0.2         httr_0.2          
##  [7] lattice_0.20-24    rCharts_0.4.2      RColorBrewer_1.0-5
## [10] Rcpp_0.11.0        RCurl_1.95-4.1     RJSONIO_1.0-3     
## [13] stringr_0.6.2      tools_3.0.2        whisker_0.3-2     
## [16] XML_3.95-0.2       xtable_1.7-1       yaml_2.1.10

Check other topics on #rstats.

To leave a comment for the author, please follow the link and comment on their blog: Fellgernon Bit - rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)