(This article was first published on All Things R, and kindly contributed to R-bloggers)
Over the last year and half, I have faced numerous challenges with geocoding the data that I have used to showcase my passion for location analytics. In 2012, I decided to take thing in my control and turned to R. Here, I am sharing a simple R script that I wrote to geo-code my data whenever I needed it, even BIG Data.
To geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps API and get a key if you have large amount of data and would do repeated geo coding.
Here is function that can be called repeatedly by other functions:
getGeoCode <- function(gcStr)
{
library("RJSONIO") #Load Library
gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
#Open Connection
connectStr <- paste('http://maps.google.com/maps/api/geocode/json?sensor=false&address=',gcStr, sep="")
con <- url(connectStr)
data.json <- fromJSON(paste(readLines(con), collapse=""))
close(con)
#Flatten the received JSON
data.json <- unlist(data.json)
lat <- data.json["results.geometry.location.lat"]
lng <- data.json["results.geometry.location.lng"]
gcodes <- c(lat, lng)
names(gcodes) <- c("Lat", "Lng")
return (gcodes)
}
Let's put this function to test:
geoCodes <- getGeoCode("Palo Alto,California")
> geoCodes
Lat Lng
"37.4418834" "-122.1430195"
Here is my sample data frame with three columns - Opposition, Ground.Country and Toss. Two of the columns, you guessed it right, need geocoding.
> head(shortDS,10)
Opposition Ground.Country Toss
1 Pakistan Karachi,Pakistan won
2 Pakistan Faisalabad,Pakistan lost
3 Pakistan Lahore,Pakistan won
4 Pakistan Sialkot,Pakistan lost
5 New Zealand Christchurch,New Zealand lost
6 New Zealand Napier,New Zealand won
7 New Zealand Auckland,New Zealand won
8 England Lord's,England won
9 England Manchester,England lost
10 England The Oval,England won
To geo code this, here is a simple one liner I execute:
> head(shortDS, 10)
Opposition Ground.Country Toss Ground.Lat Ground.Lng
1 Pakistan Karachi,Pakistan won 24.893379 67.028061
2 Pakistan Faisalabad,Pakistan lost 31.408951 73.083458
3 Pakistan Lahore,Pakistan won 31.54505 74.340683
4 Pakistan Sialkot,Pakistan lost 32.4972222 74.5361111
5 New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6 New Zealand Napier,New Zealand won -39.4928444 176.9120178
7 New Zealand Auckland,New Zealand won -36.8484597 174.7633315
8 England Lord's,England won 51.5294 -0.1727
9 England Manchester,England lost 53.479251 -2.247926
10 England The Oval,England won 51.369037 -2.378269
Happy Coding!
shortDS <- with(shortDS, data.frame(Opposition, Ground.Country, Toss,
laply(Ground.Country, function(val){getGeoCode(val)} )))
> head(shortDS, 10)
Opposition Ground.Country Toss Ground.Lat Ground.Lng
1 Pakistan Karachi,Pakistan won 24.893379 67.028061
2 Pakistan Faisalabad,Pakistan lost 31.408951 73.083458
3 Pakistan Lahore,Pakistan won 31.54505 74.340683
4 Pakistan Sialkot,Pakistan lost 32.4972222 74.5361111
5 New Zealand Christchurch,New Zealand lost -43.5320544 172.6362254
6 New Zealand Napier,New Zealand won -39.4928444 176.9120178
7 New Zealand Auckland,New Zealand won -36.8484597 174.7633315
8 England Lord's,England won 51.5294 -0.1727
9 England Manchester,England lost 53.479251 -2.247926
10 England The Oval,England won 51.369037 -2.378269
To leave a comment for the author, please follow the link and comment on his blog: All Things R.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).