Using Google Spreadsheets as a Database Source for R

September 2, 2011
By

(This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers)

I couldn’t contain myself (other more pressing things to do, but…), so I just took a quick time out and a coffee to put together a quick and dirty R function that will let me run queries over Google spreadsheet data sources and essentially treat them as database tables (e.g. Using Google Spreadsheets as a Database with the Google Visualisation API Query Language).

Here’s the function:

library(RCurl)
gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid) ) ) }

It requires the spreadsheet key value and a query; you can optionally provide a sheet number within the spreadsheet if the sheet you want to query is not the first one.

We can call the function as follows:

gsqAPI('tPfI0kerLllVLcQw7-P1FcQ','select * limit 3')

In that example, and by default, we run the query against the first sheet in the spreadsheet.

Alternatively, we can make a call like this, and run a query against sheet 3, for example:
tmpData=gsqAPI('0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc','select A,C where <= 10',3)
tmpData

My first R function

The real question is, of course, could it be useful.. (or even OUseful?!)?

Here’s another example: a way of querying the Guardian Datastore list of spreadsheets:

gsqAPI('0AonYZs4MzlZbdFdJWGRKYnhvWlB4S25OVmZhN0Y3WHc','select * where A contains "crime" and B contains "href" order by C desc limit 10')

What that call does is run a query against the Guardian Datastore spreadsheet that lists all the other Guardian Datastore spreadsheets, and pulls out references to spreadsheets relating to “crime”.

The returned data is a bit messy and requires parsing to be properly useful.. but I haven’t started looking at string manipulation in R yet…(So my question is: given a dataframe with a column containing things like <a href=”http://example.com/whatever”>Some Page</a>, how would I extract columns containing http://example.com/whatever or Some Page fields?)

[If you haven't yet discovered R, it's an environment that was developed for doing stats... I use the RStudio environment to play with it. The more I use it (and I've only just started exploring what it can do), the more I think it provides a very powerful environment for working with data in quite a tangible way, not least for reshaping it and visualising it, let alone doing stats with in. (In fact, don't use the stats bit if you don't want to; it provides more than enough data mechanic tools to be going on with;-)]

PS By the by, I’m syndicating my Rstats tagged posts through the R-Bloggers site. If you’re at all interested in seeing what’s possible with R, I recommend you subscribe to R-Bloggers, or at least have a quick skim through some of the posts on there…


To leave a comment for the author, please follow the link and comment on his blog: OUseful.Info, the blog... » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.