**Mario's Entangled Bank » R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I found myself needing to calculate the distance between a large number of longitude and latitude locations. As it turns out, because the earth is a three-dimensional object, you cannot simply pretend that you are in Flatland, albeit some beg to differ. Of course, earth is not a perfect sphere either (it’s slightly ellipsoidal with a bit of extra width around the equator), so the exercise is not entirely trivial.

There are several approaches one can take, each one with its pros and cons. As one may expect, many of these methods have been implemented in various programing languages with code fragments available at various places on the web. I was not, however, able to find R code implementing the different methods, and in particular, comparing their performance. Well, that’s all the incentive I need to try to remedy this apparent deficiency – after all geodesic distance calculations is surely something people need to do on a regular basis. My aim here is to implement a few selected methods in R and do a “back of the envelope” comparison of their performance, i.e. do their results differ. More specifically I want to calculate the great-circle distance between the two points – that is, the shortest distance over the earth’s surface – giving an ‘as-the-crow-flies’ distance between the points (ignoring any hills). The great-circle distance is why those route maps in the inflight magazines look parabolic when it appears that it would be simpler to use a ruler to connect London with Vancouver.

Many of the formulas calculating the distance between longitude/latitude points assume a spherical earth, which, as we will see, simplifies things greatly and for most purposes is quite an adequate approximation. Perhaps the simples formula for calculating the great-circle distance is the Spherical Law of Cosines which is R looks like so,

# Calculates the geodesic distance between two points specified by radian latitude/longitude using the # Spherical Law of Cosines (slc) gcd.slc <- function(long1, lat1, long2, lat2) { R <- 6371 # Earth mean radius [km] d <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1)) * R return(d) # Distance in km }

Here the longitude and latitude coordinates are given in radian, i.e. the latitude and longitude decimal degrees (DD) converted to radians like so,

# Convert degrees to radians deg2rad <- function(deg) return(deg*pi/180)

Note that for the decimal degrees positive latitudes are north of the equator, negative latitudes are south of the equator. Positive longitudes are east of Prime Meridian, negative longitudes are west of the Prime Meridian.

The Spherical Law of Cosines performs well as long as the distance is not to small (some sources claim it’s accuracy deteriorates at about the 1 metre scale). At very small distances the inverting of the cosine magnifies rounding errors. An alternative formulation that is more robust at small distances is the Haversine formula which in R looks like so (assuming the latitudes and longitudes already have been converted from degrees to radians),

# Calculates the geodesic distance between two points specified by radian latitude/longitude using the # Haversine formula (hf) gcd.hf <- function(long1, lat1, long2, lat2) { R <- 6371 # Earth mean radius [km] delta.long <- (long2 - long1) delta.lat <- (lat2 - lat1) a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.long/2)^2 c <- 2 * asin(min(1,sqrt(a))) d = R * c return(d) # Distance in km }

While both the Spherical Law of Cosines and the Haversine formula are simple they both assume a spherical earth. Taking into account that Earth is not perfectly spherical makes things a bit more messy. Vincenty inverse formula for ellipsoids is an iterative methods used to calculate the ellipsoidal distance between two points on the surface of a spheroid. A direct “translation” of an implementation by Chris Veness in JavaScript,

# Calculates the geodesic distance between two points specified by radian latitude/longitude using # Vincenty inverse formula for ellipsoids (vif) gcd.vif <- function(long1, lat1, long2, lat2) { # WGS-84 ellipsoid parameters a <- 6378137 # length of major axis of the ellipsoid (radius at equator) b <- 6356752.314245 # ength of minor axis of the ellipsoid (radius at the poles) f <- 1/298.257223563 # flattening of the ellipsoid L <- long2-long1 # difference in longitude U1 <- atan((1-f) * tan(lat1)) # reduced latitude U2 <- atan((1-f) * tan(lat2)) # reduced latitude sinU1 <- sin(U1) cosU1 <- cos(U1) sinU2 <- sin(U2) cosU2 <- cos(U2) cosSqAlpha <- NULL sinSigma <- NULL cosSigma <- NULL cos2SigmaM <- NULL sigma <- NULL lambda <- L lambdaP <- 0 iterLimit <- 100 while (abs(lambda-lambdaP) > 1e-12 & iterLimit>0) { sinLambda <- sin(lambda) cosLambda <- cos(lambda) sinSigma <- sqrt( (cosU2*sinLambda) * (cosU2*sinLambda) + (cosU1*sinU2-sinU1*cosU2*cosLambda) * (cosU1*sinU2-sinU1*cosU2*cosLambda) ) if (sinSigma==0) return(0) # Co-incident points cosSigma <- sinU1*sinU2 + cosU1*cosU2*cosLambda sigma <- atan2(sinSigma, cosSigma) sinAlpha <- cosU1 * cosU2 * sinLambda / sinSigma cosSqAlpha <- 1 - sinAlpha*sinAlpha cos2SigmaM <- cosSigma - 2*sinU1*sinU2/cosSqAlpha if (is.na(cos2SigmaM)) cos2SigmaM <- 0 # Equatorial line: cosSqAlpha=0 C <- f/16*cosSqAlpha*(4+f*(4-3*cosSqAlpha)) lambdaP <- lambda lambda <- L + (1-C) * f * sinAlpha * (sigma + C*sinSigma*(cos2SigmaM+C*cosSigma*(-1+2*cos2SigmaM*cos2SigmaM))) iterLimit <- iterLimit - 1 } if (iterLimit==0) return(NA) # formula failed to converge uSq <- cosSqAlpha * (a*a - b*b) / (b*b) A <- 1 + uSq/16384*(4096+uSq*(-768+uSq*(320-175*uSq))) B <- uSq/1024 * (256+uSq*(-128+uSq*(74-47*uSq))) deltaSigma = B*sinSigma*(cos2SigmaM+B/4*(cosSigma*(-1+2*cos2SigmaM^2) - B/6*cos2SigmaM*(-3+4*sinSigma^2)*(-3+4*cos2SigmaM^2))) s <- b*A*(sigma-deltaSigma) / 1000 return(s) # Distance in km }

To “automate” the conversion from to radians and to simplify comparison of the results we wrap up all the methods like so,

# Calculates the geodesic distance between two points specified by degrees (DD) latitude/longitude using # Haversine formula (hf), Spherical Law of Cosines (slc) and Vincenty inverse formula for ellipsoids (vif) gcd <- function(long1, lat1, long2, lat2) { # Convert degrees to radians long1 <- deg2rad(long1) lat1 <- deg2rad(lat1) long2 <- deg2rad(long2) lat2 <- deg2rad(lat2) return(list(haversine = gcd.hf(long1, lat1, long2, lat2), sphere = gcd.slc(long1, lat1, long2, lat2), vincenty = gcd.vif(long1, lat1, long2, lat2)) ) }

Generating 10 random pairs of points for which the distances are calculated shows that both the Haversine and the Spherical Law of Cosines give identical results while the Vincenty inverse formula for ellipsoids gives slightly different results.

-62.05028:58.97118 -> -170.4029:65.27488, haversine=4973.438, sphere=4973.438, vincenty=4991.855 -176.7942:-31.37877 -> 50.73972:-37.13508, haversine=10935.77, sphere=10935.77, vincenty=10957.75 86.59503:88.97836 -> -47.02333:-69.42277, haversine=17803.91, sphere=17803.91, vincenty=17783.81 57.54945:51.99925 -> -45.60534:41.76859, haversine=7243.235, sphere=7243.235, vincenty=7263.447 53.74914:-73.71246 -> 45.84887:84.32937, haversine=17577.93, sphere=17577.93, vincenty=17556.35 67.05545:-70.39945 -> 117.5418:62.28946, haversine=15265.59, sphere=15265.59, vincenty=15237.05 -110.6735:59.05593 -> 117.9950:-45.77782, haversine=16498.35, sphere=16498.35, vincenty=16486.36 -136.0702:3.196073 -> 44.39497:41.16088, haversine=15082.59, sphere=15082.59, vincenty=15091.86 56.27645:22.64495 -> -124.7018:-77.87321, haversine=13873.76, sphere=13873.76, vincenty=13861.25 126.9144:-50.72121 -> -30.78809:43.39368, haversine=18149.55, sphere=18149.55, vincenty=18146.36

So why bother with all this coding goodness when you could just use the rdist.earth() function in the fields package. As it turns out the rdist.earth() function gives an estimate that is different from the other methods,

> distance $haversine [1] 4841.874 $sphere [1] 4841.874 $vincenty [1] 4843.637 > rdist.earth(matrix(c(long1,lat1), ncol=2),matrix(c(long2,lat2), ncol=2),miles=FALSE) [,1] [1,] 4847.489

After poking around in the rdist.earth() source code it turns out that it assumes the earth’s radius to be 6378.388 km. According to Wikipedia this number seems to be the equatorial radius (the maximum radius). Because earth is not a perfect sphere, however, the radius declines as one moves to the poles reaching a polar minimum of about 6,357 km. The mean radius is 6371 km and is what I have been using in my calculations. When using the mean radius in rdist.earth() one obtains,

> rdist.earth(matrix(c(long1,lat1), ncol=2),matrix(c(long2,lat2), ncol=2),miles=FALSE, R=6371) [,1] [1,] 4841.874

which is the same result the Haversine and the Spherical Law of Cosines gives.

There is much more that can be said for the different methods of calculating the great-circle distance between two points with a vast amount of much more technical discussions available online. To sum up, as expected, all the methods assuming a spherical Earth, i.e. Haversine, Spherical Law of Cosines, and rdist.earth() (once the Earth’s radius has been corrected), give the same results. In contrast, as expected, Vincenty inverse formula for ellipsoids gives different results but appear to be within 100km of the spherical Earth methods. Which method to choose will ultimately depend on factors such as the scale of the distance calculations and the computational performance of the different methods (which I do not address here). For points located close to each other (think short haul flights) the computationally simpler (read, faster) spherical Earth methods will be quite adequate while for points located far apart (think long haul flights, in particular connecting opposite hemispheres) the substantially more computationally complex (i.e. probably slower) Vincenty inverse formula for ellipsoids will give a better result.

*This is from the “Mario’s Entangled Bank” blog (http://pineda-krch.com) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta.*

**leave a comment**for the author, please follow the link and comment on their blog:

**Mario's Entangled Bank » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.