CouchDB and R

October 2, 2010
By

(This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers)

Here are some quick crib notes on getting R talking to CouchDB using Couch's ReSTful HTTP API. We'll do it in two different ways. First, we'll construct HTTP calls with RCurl, then move on to the R4CouchDB package for a higher level interface. I'll assume you've already gotten started with CouchDB and are familiar with the basic ReST actions: GET PUT POST and DELETE.

First install RCurl and RJSONIO. You'll have to download the tar.gz's if you're on a Mac. For the second part, we'll need to install R4CouchDB, which depends on the previous two. I checked it out from GitHub and used R CMD INSTALL.

ReST with RCurl

Ping server

getURL("http://localhost:5984/")
[1] "{\"couchdb\":\"Welcome\",\"version\":\"1.0.1\"}\n"

That's nice, but we want to get the result back as a real R data structure. Try this:

welcome <- fromJSON(getURL("http://localhost:5984/"))
welcome$version
[1] "1.0.1"

Sweet!

PUT

One way to add a new record is with http PUT.

bozo = list(name="Bozo", occupation="clown", shoe.size=100)
getURL("http://localhost:5984/testing123/bozo",
       customrequest="PUT",
       httpheader=c('Content-Type'='application/json'),
       postfields=toJSON(bozo))
[1] "{\"ok\":true,\"id\":\"bozo\",\"rev\":\"1-70f5f59bf227d2d715c214b82330c9e5\"}\n"

Notice that RJSONIO has no high level PUT method, so you have to fake it using the costumrequest parameter. I'd never have figured that out without an example from R4CouchDB's source. The API of libCurl is odd, I have to say, and RCurl mostly just reflects it right into R.

If you don't like the idea of sending a put request with a get function, you could use RCurl's curlPerform. Trouble is, curlPerform returns an integer status code rather than the response body. You're supposed to provide an R function to collect the response body text. Not really worth the bother, unless you're getting into some of the advanced tricks described in the paper, R as a Web Client - the RCurl package.

bim <-  list(
  name="Bim", 
  occupation="clown",
  tricks=c("juggling", "pratfalls", "mocking Bolsheviks"))
reader = basicTextGatherer()
curlPerform(
  url = "http://localhost:5984/testing123/bim",
  httpheader = c('Content-Type'='application/json'),
  customrequest = "PUT",
  postfields = toJSON(bim),
  writefunction = reader$update
)
reader$value()

GET

Now that there's something in there, how do we get it back? That's super easy.

bozo2 <- fromJSON(getURL("http://localhost:5984/testing123/bozo"))
bozo2
$`_id`
[1] "bozo"

$`_rev`
[1] "1-646331b58ee010e8df39b5874b196c02"

$name
[1] "Bozo"

$occupation
[1] "clown"

$shoe.size
[1] 100

PUT again for updating

Updating is done by using PUT on an existing document. For example, let's give Bozo, some mad skillz:

getURL(
  "http://localhost:5984/testing123/bozo",
  customrequest="PUT",
  httpheader=c('Content-Type'='application/json'),
  postfields=toJSON(bozo2))

POST

If you POST to the database, you're adding a document and letting CouchDB assign its _id field.

bender = list(
  name='Bender',
  occupation='bending',
  species='robot')
response <- fromJSON(getURL(
  'http://localhost:5984/testing123/',
  customrequest='POST',
  httpheader=c('Content-Type'='application/json'),
  postfields=toJSON(bender)))
response
$ok
[1] TRUE

$id
[1] "2700b1428455d2d822f855e5fc0013fb"

$rev
[1] "1-d6ab7a690acd3204e0839e1aac01ec7a"

DELETE

For DELETE, you pass the doc's revision number in the query string. Sorry, Bender.

response <- fromJSON(getURL("http://localhost:5984/testing123/2700b1428455d2d822f855e5fc0013fb?rev=1-d6ab7a690acd3204e0839e1aac01ec7a",
  customrequest="DELETE"))

CRUD with R4CouchDB

R4CouchDB provides a layer on top of the techniques we've just described.

R4CouchDB uses a slightly strange idiom. You pass a cdb object, really just a list of parameters, into every R4CouchDB call and every call returns that object again, maybe modified. Results are returned in cdb$res. Maybe, they did this because R uses pass by value. Here's how you would initialize the object.

cdb <- cdbIni()
cdb$serverName <- "localhost"
cdb$port <- 5984
cdb$DBName="testing123"

Create

fake.data <- list(
  state='WA',
  population=6664195,
  state.bird='Lady GaGa')
cdb$dataList <- fake.data
cdb$id <- 'fake.data'  ## optional, otherwise an ID is generated
cdb <- cdbAddDoc(cdb)

cdb$res
$ok
[1] TRUE

$id
[1] "fake.data"

$rev
[1] "1-14bc025a194e310e79ac20127507185f"

Read

cdb$id <- 'bozo'
cdb <- cdbGetDoc(cdb)

bozo <- cdb$res
bozo
$`_id`
[1] "bozo"
... etc.

Update

First we take the document id and rev from the existing document. Then, save our revised document back to the DB.

cdb$id <- bozo$`_id`
cdb$rev <- bozo$`_rev`
bozo = list(
  name="Bozo",
  occupation="assassin",
  shoe.size=100,
  skills=c(
    'pranks',
    'honking nose',
    'kung fu',
    'high explosives',
    'sniper',
    'lock picking',
    'safe cracking'))
cdb <- cdbUpdateDoc(bozo)

Delete

Shortly thereafter, Bozo mysteriously disappeared.

cdb$id = bozo$`_id`
cdb <- cdbDeleteDoc(cdb)

More on ReST and CouchDB

To leave a comment for the author, please follow the link and comment on his blog: Digithead's Lab Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.