Google Geo Data – Data Access Without Restrictions

[This article was first published on ThinkToStart » R Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Geo-Distances are of great importance: Researchers from various disciplines refer to geographic distances – health researchers refer to geographic data when analyzing the spread of diseases, economists when evaluating the impact of transaction costs on human behavior, or sociologists when evaluating interpersonal distances (based on external factors) in human interaction.

However, each query sent to the Google Maps Distance Matrix API (currently available via the ggmap-package) is limited by the number of allowed elements, where the number of origins times the number of destinations defines the number of elements. The Google Maps Distance Matrix API has the following limits in place (Users of the standard API):

  • 2,500 free elements per day
  • 100 elements per query
  • 100 elements per 10 seconds

Thus, researchers face a limit in requesting distances. This code proposes a work-around, respectively, an approach to request the distances; specifically, the proposed code requests the driving distance and driving time between two geographical points via google maps without any API restrictions. However, the code is quiet flexible and could be adjusted to request line-distances, etc.

The example refers to the attached csv-file. The comments are part of the script.

Example csv-file

You need five R packages (data.tablehttr, stringr, XML) to run the code.

Remarks, hints and further modifications are welcome.

In the first step, you have to load the relevant packages and the attached data files, which consists of four lot/lan distances.

library("data.table")
library("httr")
library("stringr")
library("XML")

# Read Data example (Data example provided in the header)
newdata <- read.csv("D:/r_geocodes.csv", header = TRUE, sep=";")

Second, define the URL codes to request the distances via google maps.

newdata$URL <- with(newdata, paste("https://www.google.de/maps/dir/",lat1,"+",lon1,"/",lat2,",",lon2, sep=""))
newdata$URL <- as.character(newdata$URL)

Next, define the relevant functions to download the data:

# Function Extracting the last n characters from a string 
substrRight <- function(x, n){
  substr(x, nchar(x)-n+1, nchar(x))
}
#######################################################################
# Function to request google maps driving distance
download.maybe <- function(url, refetch=FALSE, path=".") {
  cnamet <- as.data.table(as.character(GET(url)))
  cnamet <- as.character(cnamet)
  # Compute Distance
  dis<-substring((strsplit(substrRight(strsplit(cnamet,"km")[[1]][1], 9), ",")[[1]])[2], 2)
  dis

  # Compute Time
  # Minutes
  dur_m <- as.numeric(gsub( "[^[:alnum:],]", "", substrRight(strsplit(cnamet,"Min.")[[1]][1], 4) ))
  dur_m
  
  # Hours (if applicable)
  durh_h_new<-as.numeric(gsub( "[^[:alnum:]]", "",
                               ifelse(grepl("Std", substrRight(strsplit(cnamet,"Min")[[1]][1], 15))=="TRUE",
                                      str_extract_all(substrRight(strsplit(substrRight(strsplit(cnamet,"Std")[[1]][1], 3),"Std.")[[1]][1], 5),"\(?[0-9,.]+\)?")[[1]],
                                      "0")))
  durh_h_new
  # Change in Minutes
  dur_fin<-dur_m+(durh_h_new*60)
  dur_fin
  # Combine all
  fin<- as.character(paste (dis, dur_fin, sep = " ", collapse = NULL))
  fin
}

Finally, run the corresponding function for your data (here: example data set).

# First Row: Google URL
# Second Column: Distance
# Third Column: Driving Time (Hint: Always the current driving time . might differ due traffic!!!)
files <- as.data.frame(t(as.data.frame(strsplit(sapply(newdata$URL, download.maybe, path=path), "\, |\,| "))))
colnames(files)[1] <- "Distance in km"
colnames(files)[2] <- "Driving Time in minutes"

That’s it. Now, you should get the following output data file.

SC_R_Output_Geo_Code

The post Google Geo Data – Data Access Without Restrictions appeared first on ThinkToStart.

To leave a comment for the author, please follow the link and comment on their blog: ThinkToStart » R Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)