Computational efficiency of great-circle distance calculations in R

November 28, 2010
By

(This article was first published on Mario's Entangled Bank » R, and kindly contributed to R-bloggers)

An obvious omission in my previous post on Great-circle distance calculations in R was a lack of discussion on the computational efficiency of the various methods, and in particular comparing different implementations of the same method. One of the comments pointed out that the geosphere package implements spherical trigonometry functions for geographic applications, including great-circle distance calculations. My aim here is to explore the computational efficiency of the various methods, in particular comparing the different implementation of them.Specifically, I will be comparing three methods for calculating great-circle distance, the Spherical Law of Cosines, the Haversine formula and the Vincenty inverse formula for ellipsoids (the previous post has more details on the methods).

For the Spherical Law of Cosines, the Haversine formula and the Vincenty inverse formula for ellipsoids I will compare the implementations provided in the geosphere and fields packages with my own implementations (previous post). Note that the fields package only implements the Spherical Law of Cosines. I will do the comparison by timing the distance calculation between 100000 randomly chosen pairs of points where each of the methods is using the same set of points. First I generate the set of points,

n     <- 100000
long1 <- runif(n, -180, 180)
lat1  <- runif(n, -90, 90)
long2 <- runif(n, -180, 180)
lat2  <- runif(n, -90, 90)


For the geosphere package I can then time the distance calculations using the Haversine formula as follows,

required('geosphere')
start.time <- proc.time()[3]
for (i in seq(n)) {
distHaversine(c(long1[i],lat1[i]), c(long2[i],lat2[i]), r=6371)
}
end.time <- proc.time()[3]
end.time - start.time


Using the Spherical Law of Cosines,

start.time <- proc.time()[3]
for (i in seq(n)) {
distCosine(c(long1[i],lat1[i]), c(long2[i],lat2[i]), r=6371)
}
end.time <- proc.time()[3]
end.time - start.time


Using the Vincenty inverse formula for ellipsoids,

start.time <- proc.time()[3]
for (i in seq(n)) {
distVincentySphere(c(long1[i],lat1[i]), c(long2[i],lat2[i]), r=6371)
}
end.time <- proc.time()[3]
end.time - start.time


Using the rdist.earth from the fields package, which implements the Spherical Law of Cosines,

start.time <- proc.time()[3]
for (i in seq(n)) {
rdist.earth(matrix(c(long1[i],lat1[i]), ncol=2),matrix(c(long2[i],lat2[i]), ncol=2),miles=FALSE)
}
end.time <- proc.time()[3]
end.time - start.time


Using my implementations described in the previous post (the following functions have to be defined deg2rad, gcd.slc, gcd.hf and gcd.vif),

# Haversine formula
start.time <- proc.time()[3]
for (i in seq(n)) {
}
end.time <- proc.time()[3]
end.time - start.time

# Spherical Law of Cosines
start.time <- proc.time()[3]
for (i in seq(n)) {
}
end.time <- proc.time()[3]
end.time - start.time

# Vincenty inverse formula for ellipsoids
start.time <- proc.time()[3]
for (i in seq(n)) {
}
end.time <- proc.time()[3]
end.time - start.time


After crunching all of this the moment of truth is here. The timings (in seconds) are,

• geosphere:
• distHaversine: 14.922
• distCosine: 16.651
• distVincentyEllipsoid: 65.316
• fields:
• rdist.earth: 20.37
• My implementation:
• gcd.hf: 2.218
• gcd.slc: 1.712
• gcd.vif: 24.12

Needless to say, I was rather surprised by the results, in particular that packages performed so poorly. My implementations was just a “back of the envelope” attempt at coding up some of these methods without any conscious goal of being computationally efficient. In retrospect I realize that I did have only one criterion, to keep things short. For example, my version of the Spherical Law of Cosines is a 5 line function (of which only one line does the actual calculations) while the distCosine function is 19 lines (15 lines performing calculations) and the rdist.earth function is 24 lines (9 lines of calculations). Could this explain the large differences in the performances? I am incredulous to this proposition and if this would be the case it would be a serious weakness of R. I recall seeing a recent discussion online somewhere about performance effects of brackets in R with something along the lines of more brackets (e.g. in mathematical expressions) slowing down the evaluation. Is there a connection? I don’t know what to think, but it is disconcerting.

On a somewhat different note, clearly the way to go (particularly in a package) would be to implement the core of these type of calculations in C. Since I have yet to find an R implementation that does this, this might be incentive enough for me to actually spend an evening doing it. If someone would be interested in incorporating these into a package I would be quite happy providing the C code for this.

This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.

Filed under: cookbook, R

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...