A Request for Foursquare Data

March 25, 2011
By

(This article was first published on John Myles White » Statistics, and kindly contributed to R-bloggers)

[UPDATE 3/28/2011: Fixed an enormous bug in the R code.]

I’m trying to collect data sets that showcase how the classical statistical distributions appear in modern contexts. I’ve already got some data that shows how the gamma distribution appears in video game scores, and now I’m hoping to find an example where the exponential distribution shows up. I think that checkins for Foursquare might be a good place to start.

To test this intuition, I’m hoping to collect some pilot data. Below you’ll find some code that you can use to help me gather data.

First, there’s a shell script to gather your own checkin data from FourSquare. To use this script, you need to substitute your e-mail address where EMAIL appears and your password where PASSWORD appears in the code below:

1
curl -u 'EMAIL:PASSWORD' https://api.foursquare.com/v1/history?l=250 > checkin_history.xml

And second there’s an R script you can use to preprocess the data from the last step into a nice format before sending it to me. If you’re not an R user, you can easily skip this step and send the data you have in its raw XML format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
library('plyr')
library('XML')
filename <- 'checkin_history.xml'
tree <- xmlTreeParse(filename, asTree = TRUE)
checkins <- tree$doc$children$checkins
venue.names <- c()
latitudes <- c()
longitudes <- c()
for (i in 1:length(checkins))
{
  venue.names <- c(venue.names, as.character(checkins[i]$checkin[['venue']][['name']][['text']])[6])
  latitudes <- c(latitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolat']][['text']])$value))
  longitudes <- c(longitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolong']][['text']])$value))
}
checkin.data <- data.frame(Venue = factor(venue.names), Latitude = as.numeric(latitudes), Longitude = as.numeric(longitudes))
count.data <- ddply(checkin.data, 'Venue', nrow)
names(count.data) <- c('Venue', 'TotalCheckins')
write.csv(count.data, file = 'count_data.csv', row.names = FALSE)

After running these two pieces of code, the output file, count_data.csv, should look like this:

Venue TotalCheckins
“Brooklyn Boulders” 13

Once you’ve got data, you can send it to me by e-mail at [email protected].

To leave a comment for the author, please follow the link and comment on their blog: John Myles White » Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series











Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)