(This article was first published on

**OutLie..R**, and kindly contributed to R-bloggers)

About 8 years ago, I was sitting in class listening to a guest lecturer talk about how community events can be described like celestial bodies with their own gravity, where the size and importance of the event would attract more people, from farther away. Much like a black hole, where the bigger the mass of the black hole the higher the gravity.

In physics gravity is a constant, for a community event the gravity can be determined by using the number of participants, and the distance traveled. Where the higher the number of participants and the greater the distance traveled would show an event with higher gravity. For example, a farmers market where only the locals come, versus an international conference of some kind. Assuming both events attract the same number of people, the international conference would have the higher gravity as there would be a larger distance traveled.

Of the two elements the number of participants can be easily found; either by number of seats sold, tickets, counting the number of people present, and so on. The more difficult information is to determine the distance traveled. For this two points are needed, the event (destination) and the point of origin (person’s home). The most accurate method would be to get a GPS coordinate for every home, but this would be very difficult mainly because most people do not know what it is, and second they are probably not willing to divulge such information. Another alternative is to request address, again people are becoming much more savvy about personal information, as they should be, and getting a good sample might be problematic. The solution then lies with the zip code, a number with the required latitude and longitude numbers that is broad enough so people do not feel their privacy is being invaded, while still being able to determine a reasonable distance number for each participant.

Using the zip code, and the associated lat and long information the numbers can then be put into R code to draw fantastic maps using the great circle inspired by Oscar Perpiñán Lamigueiro. One question did come out of the data and that is which of the several methods to use when drawing and determining the distance?

The question arises out of the equations assumption on the shape of the earth. Using the Geosphere package there were three equations, the first two Haversine and Vincenty Sphere both assume the earth is round. Which as it turns out it is more elliptical, so there is the Vincenty Ellipsoid. What I wanted to know was is there a big difference between the different formulas? And if so, how big?

require(geosphere)

require(maps)

data(us.cities)

#Setting up the data, ‘ny’ is the long. and lat. for New York City, ‘all’ is a matrix of all the

# cities available in the geosphere package (1005), with the long. and lat. data.

ny<-c(-118.41, 34.11)

all<-matrix(data=c(us.cities$long, us.cities$lat), ncol=2)

#Summing the distance between NY and all the other cities in the US (1005 of them)

#by so doing the error is compounded with each additional city

hav<-sum(distm(ny, all, fun=distHaversine))

hav.time<-proc.time()

v.sphere<-sum(distm(ny, all, fun=distVincentySphere))

v.sphere.time<-proc.time()

v.ellip<-sum (distm(ny, all, fun=distVincentyEllipsoid))

v.ellip.time<-proc.time()

hav.time; v.sphere.time; v.ellip.time;

proc.time<-c (1.350, 1.350, 2.510)

row.names<-c(‘Haversine’,’Vincenty.Sphere’, ‘Vincenty.Ellipsoid’)

ny.all<-rbind(hav, v.sphere, v.ellip); ny.all<-cbind(ny.all, proc.time)

rownames(ny.all)<-row.names; colnames(ny.all)<-c(‘Sum Distance’, ‘Processor Time’)

ny.all

#Determining the difference between the various models available in the geosphere package

#Meters were conveted into miles, the largest difference between the models was approximately

#1090 miles, or 1.085326 miles per city of difference, this is considerable

hav.v.ellp<-(v.ellip-hav)*0.000621371192

hav.v.sphere<-abs(hav-v.sphere)*0.000621371192

hav.v.ellp; hav.v.sphere

diff<-rbind(hav.v.ellp, hav.v.sphere)

rownames(diff)<-c(‘Haversine-Vincenty.Ellipsoid’,’Haversine-Vincenty.Sphere’)

colnames(diff)<-’Distance (miles)’; diff

#what is the average error

hav.v.ellp/1005

In the end the Vincenty.Ellipsoid was used as the method for determining the distance as it was the most precise by an average margin of 1.0853 miles per city, this is a significant margin of error when many cities are being analyzed and the extra computing time is worth it.

The next post will show how the data can be used to analyze two different community events.

To

**leave a comment**for the author, please follow the link and comment on their blog:**OutLie..R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...