A Request for Foursquare Data

[This article was first published on John Myles White » Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

[UPDATE 3/28/2011: Fixed an enormous bug in the R code.]

I’m trying to collect data sets that showcase how the classical statistical distributions appear in modern contexts. I’ve already got some data that shows how the gamma distribution appears in video game scores, and now I’m hoping to find an example where the exponential distribution shows up. I think that checkins for Foursquare might be a good place to start.

To test this intuition, I’m hoping to collect some pilot data. Below you’ll find some code that you can use to help me gather data.

First, there’s a shell script to gather your own checkin data from FourSquare. To use this script, you need to substitute your e-mail address where EMAIL appears and your password where PASSWORD appears in the code below:

1
curl -u 'EMAIL:PASSWORD' https://api.foursquare.com/v1/history?l=250 > checkin_history.xml

And second there’s an R script you can use to preprocess the data from the last step into a nice format before sending it to me. If you’re not an R user, you can easily skip this step and send the data you have in its raw XML format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
library('plyr')
library('XML')
filename <- 'checkin_history.xml'
tree <- xmlTreeParse(filename, asTree = TRUE)
checkins <- tree$doc$children$checkins
venue.names <- c()
latitudes <- c()
longitudes <- c()
for (i in 1:length(checkins))
{
  venue.names <- c(venue.names, as.character(checkins[i]$checkin[['venue']][['name']][['text']])[6])
  latitudes <- c(latitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolat']][['text']])$value))
  longitudes <- c(longitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolong']][['text']])$value))
}
checkin.data <- data.frame(Venue = factor(venue.names), Latitude = as.numeric(latitudes), Longitude = as.numeric(longitudes))
count.data <- ddply(checkin.data, 'Venue', nrow)
names(count.data) <- c('Venue', 'TotalCheckins')
write.csv(count.data, file = 'count_data.csv', row.names = FALSE)

After running these two pieces of code, the output file, count_data.csv, should look like this:

Venue TotalCheckins
“Brooklyn Boulders” 13

Once you’ve got data, you can send it to me by e-mail at [email protected].

To leave a comment for the author, please follow the link and comment on their blog: John Myles White » Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)