Geocode your data using, R, JSON and Google Maps’ Geocoding APIs

January 24, 2012
By

(This article was first published on All Things R, and kindly contributed to R-bloggers)



Over the last year and half, I have faced numerous challenges with geocoding the data that I have used to showcase my passion for location analytics.  In 2012, I decided to take thing in my control and turned to R.  Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.


To geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.

Here is function that can be called repeatedly by other functions:

getGeoCode <- function(gcStr)
{
  library("RJSONIO") #Load Library
  gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
 #Open Connection
 connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="") 
  con <- url(connectStr)
  data.json <- fromJSON(paste(readLines(con), collapse=""))
  close(con)
#Flatten the received JSON
  data.json <- unlist(data.json)
  lat <- data.json["results.geometry.location.lat"]
  lng <- data.json["results.geometry.location.lng"]
  gcodes <- c(lat, lng)
  names(gcodes) <- c("Lat", "Lng")
  return (gcodes)
}

Let's put this function to test:
geoCodes <- getGeoCode("Palo Alto,California")

> geoCodes
           Lat            Lng 
  "37.4418834" "-122.1430195" 


You can run this on the entire column of a data frame or a data table:

Here  is my sample data frame with three columns - Opposition, Ground.Country and Toss. Two of the columns, you guessed it right, need geocoding.

> head(shortDS,10)
     Opposition              Ground.Country Toss
1      Pakistan            Karachi,Pakistan  won
2      Pakistan         Faisalabad,Pakistan lost
3      Pakistan             Lahore,Pakistan  won
4      Pakistan            Sialkot,Pakistan lost
5   New Zealand    Christchurch,New Zealand lost
6   New Zealand          Napier,New Zealand  won
7   New Zealand        Auckland,New Zealand  won
8       England              Lord's,England  won
9       England          Manchester,England lost
10      England            The Oval,England  won

To geo code this, here is a simple one liner I execute:

shortDS <- with(shortDS, data.frame(Opposition, Ground.Country, Toss,
                  laply(Ground.Country, function(val){getGeoCode(val)})))



> head(shortDS, 10)
    Opposition           Ground.Country Toss  Ground.Lat  Ground.Lng
1     Pakistan         Karachi,Pakistan  won   24.893379   67.028061
2     Pakistan      Faisalabad,Pakistan lost   31.408951   73.083458
3     Pakistan          Lahore,Pakistan  won    31.54505   74.340683
4     Pakistan         Sialkot,Pakistan lost  32.4972222  74.5361111
5  New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6  New Zealand       Napier,New Zealand  won -39.4928444 176.9120178
7  New Zealand     Auckland,New Zealand  won -36.8484597 174.7633315
8      England           Lord's,England  won     51.5294     -0.1727
9      England       Manchester,England lost   53.479251   -2.247926
10     England         The Oval,England  won   51.369037   -2.378269



Happy Coding!

To leave a comment for the author, please follow the link and comment on his blog: All Things R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.