Using Google Spreadsheets as a Database Source for R

[This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I couldn’t contain myself (other more pressing things to do, but…), so I just took a quick time out and a coffee to put together a quick and dirty R function that will let me run queries over Google spreadsheet data sources and essentially treat them as database tables (e.g. Using Google Spreadsheets as a Database with the Google Visualisation API Query Language).

Here’s the function:

library(RCurl)
gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid) ) ) }

It requires the spreadsheet key value and a query; you can optionally provide a sheet number within the spreadsheet if the sheet you want to query is not the first one.

We can call the function as follows:

gsqAPI('tPfI0kerLllVLcQw7-P1FcQ','select * limit 3')

In that example, and by default, we run the query against the first sheet in the spreadsheet.

Alternatively, we can make a call like this, and run a query against sheet 3, for example:
tmpData=gsqAPI('0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc','select A,C where <= 10',3)
tmpData

My first R function

The real question is, of course, could it be useful.. (or even OUseful?!)?

Here’s another example: a way of querying the Guardian Datastore list of spreadsheets:

gsqAPI('0AonYZs4MzlZbdFdJWGRKYnhvWlB4S25OVmZhN0Y3WHc','select * where A contains "crime" and B contains "href" order by C desc limit 10')

What that call does is run a query against the Guardian Datastore spreadsheet that lists all the other Guardian Datastore spreadsheets, and pulls out references to spreadsheets relating to “crime”.

The returned data is a bit messy and requires parsing to be properly useful.. but I haven’t started looking at string manipulation in R yet…(So my question is: given a dataframe with a column containing things like Some Page, how would I extract columns containing http://example.com/whatever or Some Page fields?)

[If you haven’t yet discovered R, it’s an environment that was developed for doing stats… I use the RStudio environment to play with it. The more I use it (and I’ve only just started exploring what it can do), the more I think it provides a very powerful environment for working with data in quite a tangible way, not least for reshaping it and visualising it, let alone doing stats with in. (In fact, don’t use the stats bit if you don’t want to; it provides more than enough data mechanic tools to be going on with;-)]

PS By the by, I’m syndicating my Rstats tagged posts through the R-Bloggers site. If you’re at all interested in seeing what’s possible with R, I recommend you subscribe to R-Bloggers, or at least have a quick skim through some of the posts on there…

To leave a comment for the author, please follow the link and comment on their blog: OUseful.Info, the blog... » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)