**Mario's Entangled Bank » R**, and kindly contributed to R-bloggers)

An obvious omission in my previous post on Great-circle distance calculations in R was a lack of discussion on the computational efficiency of the various methods, and in particular comparing different implementations of the same method. One of the comments pointed out that the *geosphere* package implements spherical trigonometry functions for geographic applications, including great-circle distance calculations. My aim here is to explore the computational efficiency of the various methods, in particular comparing the different implementation of them.Specifically, I will be comparing three methods for calculating great-circle distance, the Spherical Law of Cosines, the Haversine formula and the Vincenty inverse formula for ellipsoids (the previous post has more details on the methods).

For the Spherical Law of Cosines, the Haversine formula and the Vincenty inverse formula for ellipsoids I will compare the implementations provided in the geosphere and *fields* packages with my own implementations (previous post). Note that the *fields* package only implements the Spherical Law of Cosines. I will do the comparison by timing the distance calculation between 100000 randomly chosen pairs of points where each of the methods is using the same set of points. First I generate the set of points,

n <- 100000 long1 <- runif(n, -180, 180) lat1 <- runif(n, -90, 90) long2 <- runif(n, -180, 180) lat2 <- runif(n, -90, 90)

For the geosphere package I can then time the distance calculations using the Haversine formula as follows,

required('geosphere') start.time <- proc.time()[3] for (i in seq(n)) { distHaversine(c(long1[i],lat1[i]), c(long2[i],lat2[i]), r=6371) } end.time <- proc.time()[3] end.time - start.time

Using the Spherical Law of Cosines,

start.time <- proc.time()[3] for (i in seq(n)) { distCosine(c(long1[i],lat1[i]), c(long2[i],lat2[i]), r=6371) } end.time <- proc.time()[3] end.time - start.time

Using the Vincenty inverse formula for ellipsoids,

start.time <- proc.time()[3] for (i in seq(n)) { distVincentySphere(c(long1[i],lat1[i]), c(long2[i],lat2[i]), r=6371) } end.time <- proc.time()[3] end.time - start.time

Using the *rdist.earth* from the *fields* package, which implements the Spherical Law of Cosines,

start.time <- proc.time()[3] for (i in seq(n)) { rdist.earth(matrix(c(long1[i],lat1[i]), ncol=2),matrix(c(long2[i],lat2[i]), ncol=2),miles=FALSE) } end.time <- proc.time()[3] end.time - start.time

Using my implementations described in the previous post (the following functions have to be defined *deg2rad*, *gcd.slc*, *gcd.hf *and *gcd.vif*),

# Haversine formula start.time <- proc.time()[3] for (i in seq(n)) { gcd.hf(deg2rad(long1[i]),deg2rad(lat1[i]), deg2rad(long2[i]),deg2rad(lat2[i])) } end.time <- proc.time()[3] end.time - start.time # Spherical Law of Cosines start.time <- proc.time()[3] for (i in seq(n)) { gcd.slc(deg2rad(long1[i]),deg2rad(lat1[i]), deg2rad(long2[i]),deg2rad(lat2[i])) } end.time <- proc.time()[3] end.time - start.time # Vincenty inverse formula for ellipsoids start.time <- proc.time()[3] for (i in seq(n)) { gcd.vif(deg2rad(long1[i]),deg2rad(lat1[i]), deg2rad(long2[i]),deg2rad(lat2[i])) } end.time <- proc.time()[3] end.time - start.time

After crunching all of this the moment of truth is here. The timings (in seconds) are,

*geosphere*:- distHaversine: 14.922
- distCosine: 16.651
- distVincentyEllipsoid: 65.316

*fields*:- rdist.earth: 20.37

- My implementation:
- gcd.hf: 2.218
- gcd.slc: 1.712
- gcd.vif: 24.12

Needless to say, I was rather surprised by the results, in particular that packages performed so poorly. My implementations was just a “back of the envelope” attempt at coding up some of these methods without any conscious goal of being computationally efficient. In retrospect I realize that I did have only one criterion, to keep things short. For example, my version of the Spherical Law of Cosines is a 5 line function (of which only one line does the actual calculations) while the *distCosine* function is 19 lines (15 lines performing calculations) and the *rdist.earth* function is 24 lines (9 lines of calculations). Could this explain the large differences in the performances? I am incredulous to this proposition and if this would be the case it would be a serious weakness of R. I recall seeing a recent discussion online somewhere about performance effects of brackets in R with something along the lines of more brackets (e.g. in mathematical expressions) slowing down the evaluation. Is there a connection? I don’t know what to think, but it is disconcerting.

On a somewhat different note, clearly the way to go (particularly in a package) would be to implement the core of these type of calculations in C. Since I have yet to find an R implementation that does this, this might be incentive enough for me to actually spend an evening doing it. If someone would be interested in incorporating these into a package I would be quite happy providing the C code for this.

*This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.*

**leave a comment**for the author, please follow the link and comment on their blog:

**Mario's Entangled Bank » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...