Google Spreadsheets API: Listing Individual Spreadsheet Sheets in R

September 7, 2011
By

(This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers)

In Using Google Spreadsheets as a Database Source for R, I described a simple Google function for pulling data into R from a Google Visualization/Chart tools API query language query applied to a Google spreadsheet, given the spreadsheet key and worksheet ID. But how do you get a list of sheets in spreadsheet, without opening up the spreadsheet and finding the sheet names or IDs directly?

The Google Spreadsheets API, that’s how… (see also GData Samples. The documentation appears to be all over the place…)

To look up the sheets associated with a spreadsheet identified by its key value KEY, construct a URL of the form:

http://spreadsheets.google.com/feeds/worksheets/KEY/public/basic

This should give you an XML output. To get the output as a JSON feed, append ?alt=json to the end of the URL.

Having constructed the URL for sheets listing for a spreadsheet with a given key identifier, we can pull in and parse either the XML version, or the JSON version, into R and identify all the different sheets contained within the spreadsheet document as a whole.

First, the JSON version. I use the RJSONIO library to handle the feed:

library(RJSONIO)
sskey='0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc'
ssURL=paste( sep="", 'http://spreadsheets.google.com/feeds/worksheets/', sskey, '/public/basic?alt=json' )
spreadsheet=fromJSON(ssURL)
sheets=c()
for (el in spreadsheet$feed$entry) sheets=c(sheets,el$title['$t'])
as.data.frame(sheets)

Using a variant of the function described in the previous post, we can look up the data contained in a sheet by the sheetname:

library(RCurl)
gsqAPI = function( key, query,gid=0){ return( read.csv( paste( sep="", 'http://spreadsheets.google.com/tq?', 'tqx=out:csv', '&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ) ) ) }
gsqAPI(sskey,"select * limit 10", "Quali Stats")

getting a list of sheet names from a goog spreadsheet into R

The second approach is to pull on the XML version of the sheet data feed. (This PDF tutorial got me a certain way along the road: Extracting Data from XML, but then I got confused about what to do next (I still don’t have a good feel for identifying or wrangling with R data structures, though at least I now know how to use the class() function to find out what R things the type of any given item is;-) and had to call on the lazy web to work out how to do this in the end!)

library(XML)
ssURL=paste( sep="", 'http://spreadsheets.google.com/feeds/worksheets/', ssKey, '/public/basic' )
ssd=xmlTreeParse( ssURL, useInternal=TRUE )
nodes=getNodeSet( ssd, "//x:entry", "x" )
titles=sapply( nodes, function(x) xmlSApply( x, xmlValue ) )
library(stringr)
data.frame( sheetName = titles['content',], sheetId = str_sub(titles['id',], -3, -1 ) )

data frame in r

In this example, we also pull out the sheet ID that is used by the Google spreadsheets API to access individual sheets, just in case. (Note that these IDs are not the same as the numeric gid values used in the chart API query language…)

PS Note: my version of R seemed to choke if I gave it https: headed URLs, but it was fine with http:


To leave a comment for the author, please follow the link and comment on his blog: OUseful.Info, the blog... » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , ,

Comments are closed.