Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I wanted to play around with the Facebook Graph API  using the Graph API Explorer page as a coding exercise. This facility allows one to use the API with a temporary authorisation token. Now, I don’t know how to make an R package for the proper API where you have to register for an API key and do some Oath stuff because that is above my current skill set but the Explorer page itself is a nice middle ground.

Therefore I’ve came up with a self contained R function which allows me to do just that (full code at end of post):

# load packages
library(RCurl)
library(RJSONIO)

t(df[7,])

# post.id                      "127031120644257_319044381442929"
# from.name                    "Doctor Who"
# from.id                      "127031120644257"
# to.name                      "Doctor Who"
# to.id                        "127031120644257"
# to.category                  "Tv show"
# created.time                 "2011-11-10 11:13:42"
# message                      "Has it ever been found out who blew up the TARDIS?"
# type                         "status"
# likes.count                  NA
# sample.comments              "Did the tardis blow up I haven't seen all of sesion 6&7 [next>>] \"7\" ??? [next>>] the pandorica was obsorbin earth so he blew it up with the tardis"
# sample.comments.from.name    "Alex Nomikos [next>>] Paul Morris [next>>] Vivienne Leigh Bruen"
# sample.comments.from.id      "100001033497348 [next>>] 595267764 [next>>] 100000679940192"
# sample.comments.created.time "2011-11-10 11:23:36 [next>>] 2011-11-10 11:29:56 [next>>] 2011-11-10 13:04:53"


In the above, I’m using “[next>>]” as a way separating entities in the same cell in order to keep the data frame structure. The order is maintained across cells i.e. the first entity of sample.comments.from.name corresponds to the first entity of sample.comments.from.id etc, etc.

The main problem I had, and have been for a long time with R, is dealing with a list which has a NULL as one of it’s elements and then un-listing it but maintaining the same length:

mylist <- list(a=1, b=NULL, c="hello"
unlist(mylist, use.names = FALSE)
# [1] "1"     "hello"


Whereas what I really want is for the NULL to be converted to NA and thus the length of the list is maintained, e.g.

mylist <- list(a=1, b=NULL, c="hello"
mylist[sapply(mylist, is.null)] <- NA
unlist(mylist, use.names = FALSE)
# [1] "1"     NA      "hello"


But I don’t know of any automatic way of doing that and so have to do it manually. I tell you, these NULL is lists are really causing me headaches!

Anyway, back to the Facebook_Graph_API_Explorer() function, there are a couple of points to bear in mind:

1. This will only work on Windows because I don’t know what a cross platform version of winDialogString is
2. You must already be signed into Facebook (i.e. you must have an account and be signed in).

The function will guide you through the process with dialogue boxes so it should be easy to use for anyone. I think next time I’ll try a web scraping exercise on the HTML of a facebook wall page using XPath, depends on how much time I get!

Tony Breyal

P.S. Full code is below:

library(RCurl)
library(RJSONIO)

get_json_df <- function(data) {
l <- list(
post.id = lapply(data, function(post) post$id), from.name = lapply(data, function(post) post$to$data[[1]]$name),
from.id = lapply(data, function(post) post$to$data[[1]]$id), to.name = lapply(data, function(post) post$to$data[[1]]$name),
to.id = lapply(data, function(post) post$to$data[[1]]$id), to.category = lapply(data, function(post) post$to$data[[1]]$category),
created.time = lapply(data, function(post) as.character(as.POSIXct(post$created_time, origin="1970-01-01", tz="GMT"))), message = lapply(data, function(post) post$message),
type = lapply(data, function(post) post$type), likes.count = lapply(data, function(post) post$likes$count), comments.count = lapply(data, function(post) post$comments$count), sample.comments = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$message), collapse = " [next>>] ")),
sample.comments.from.name = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$from$name), collapse = " [next>>] ")),
sample.comments.from.id = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$from$id), collapse = " [next>>] ")),
sample.comments.created.time = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) as.character(as.POSIXct(comment$created_time, origin="1970-01-01", tz="GMT"))), collapse = " [next>>] ")) ) # replace of occurances of NULL with "" df = data.frame(do.call("cbind", lapply(l, function(x) sapply(x, function(xx) ifelse(is.null(xx), NA, xx))))) return(df) } # STEP 1: Get certs so we can access https links (we'll delete it at the end of the script) if(!file.exists("cacert.perm")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.perm") # STEP 2: Get fackebook token to access data. I need a crossplatform version of winDialog and winDialogString otherwise this only works on Windows winDialog(type = "ok", "Make sure you have already signed into Facebook.\n\nWhen browser opens, please click 'Get Access Token' twice. You DO NOT need to select/check any boxes for a public feed.\n\n After pressing OK, swich over to your now open browser.") browseURL("http://developers.facebook.com/tools/explorer/?method=GET&path=100002667499585") token <- winDialogString("When browser opens, please click 'Get Access Token' twice and copy/paste token below", "") # STEP 3: Get facebook ID. This can be a fanpage or whatever e.g. https://www.facebook.com/DoctorWho ID <- winDialogString("Please enter FB name id below:", "https://www.facebook.com/DoctorWho") ID <- gsub(".*com/", "", ID) # STEP 4: Construct Facebook Graph API URL u <- paste("https://graph.facebook.com/", ID, "/feed", "?date_format=U", "&access_token=", token, sep = "") # STEP 5: How far back do you want get data for? Format should be YYYY-MM-DD user.last.date <- try(as.Date(winDialogString("Please enter a date for how roughly far back to gather data from using this format: yyyy-mm-dd", "")), silent = TRUE) current.last.date <- user.last.date + 1 # Get data df.list <- list() i <- 1 while(current.last.date > user.last.date) { # Download the JSON feed json <- getURL(u, cainfo = "cacert.perm") json <- fromJSON(json, simplify = FALSE) data <- json$data
stopifnot(!is.null(data))

# Get json Data Frame
df.list[[i]] <- get_json_df(data)
i <- i + 1

# variables for while loop
current.last.date <- as.Date(as.POSIXct(json$data[[length(json$data)]]$created_time, origin="1970-01-01", tz="GMT")) print(paste("Current batch of dates being processed is:", current.last.date, "(loading more...)")) u <- json$paging\$next
}

file.remove("cacert.perm")
# return data frame
df <- do.call("rbind", df.list)
return(df)
}

t(df[4,])
# post.id                      "127031120644257_319062954774405"
# from.name                    "Torchwood"
# from.id                      "119328091441982"
# to.name                      "Torchwood"
# to.id                        "119328091441982"
# to.category                  "Tv show"
# created.time                 "2011-11-10 12:05:21"
# message                      "If you're missing Torchwood & Doctor Who and are after some good, action-packed science fiction, why not check out FOX's awesome prehistoric romp, Terra Nova? It's carried in the UK on Sky TV and is well worth catching up with & following! The idea - The Earth is dying, it's in its final years. Life's intolerable & getting worse. Scientists take advantage of a rift in time & space to set up a 'fresh start' colony on Terra Nova - the earth, 60 million years ago. The adventure then begins..."
# likes.count                  NA