CouchDB and R

[This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here are some quick crib notes on getting R talking to CouchDB using Couch’s ReSTful HTTP API. We’ll do it in two different ways. First, we’ll construct HTTP calls with RCurl, then move on to the R4CouchDB package for a higher level interface. I’ll assume you’ve already gotten started with CouchDB and are familiar with the basic ReST actions: GET PUT POST and DELETE.

First install RCurl and RJSONIO. You’ll have to download the tar.gz’s if you’re on a Mac. For the second part, we’ll need to install R4CouchDB, which depends on the previous two. I checked it out from GitHub and used R CMD INSTALL.

ReST with RCurl

Ping server

getURL("http://localhost:5984/")
[1] "{\"couchdb\":\"Welcome\",\"version\":\"1.0.1\"}\n"

That’s nice, but we want to get the result back as a real R data structure. Try this:

welcome <- fromJSON(getURL("http://localhost:5984/"))
welcome$version
[1] "1.0.1"

Sweet!

PUT

One way to add a new record is with http PUT.

bozo = list(name="Bozo", occupation="clown", shoe.size=100)
getURL("http://localhost:5984/testing123/bozo",
       customrequest="PUT",
       httpheader=c('Content-Type'='application/json'),
       postfields=toJSON(bozo))
[1] "{\"ok\":true,\"id\":\"bozo\",\"rev\":\"1-70f5f59bf227d2d715c214b82330c9e5\"}\n"

Notice that RJSONIO has no high level PUT method, so you have to fake it using the costumrequest parameter. I'd never have figured that out without an example from R4CouchDB's source. The API of libCurl is odd, I have to say, and RCurl mostly just reflects it right into R.

If you don't like the idea of sending a put request with a get function, you could use RCurl's curlPerform. Trouble is, curlPerform returns an integer status code rather than the response body. You're supposed to provide an R function to collect the response body text. Not really worth the bother, unless you're getting into some of the advanced tricks described in the paper, R as a Web Client - the RCurl package.

bim <-  list(
  name="Bim", 
  occupation="clown",
  tricks=c("juggling", "pratfalls", "mocking Bolsheviks"))
reader = basicTextGatherer()
curlPerform(
  url = "http://localhost:5984/testing123/bim",
  httpheader = c('Content-Type'='application/json'),
  customrequest = "PUT",
  postfields = toJSON(bim),
  writefunction = reader$update
)
reader$value()

GET

Now that there's something in there, how do we get it back? That's super easy.

bozo2 <- fromJSON(getURL("http://localhost:5984/testing123/bozo"))
bozo2
$`_id`
[1] "bozo"

$`_rev`
[1] "1-646331b58ee010e8df39b5874b196c02"

$name
[1] "Bozo"

$occupation
[1] "clown"

$shoe.size
[1] 100

PUT again for updating

Updating is done by using PUT on an existing document. For example, let's give Bozo, some mad skillz:

getURL(
  "http://localhost:5984/testing123/bozo",
  customrequest="PUT",
  httpheader=c('Content-Type'='application/json'),
  postfields=toJSON(bozo2))

POST

If you POST to the database, you're adding a document and letting CouchDB assign its _id field.

bender = list(
  name='Bender',
  occupation='bending',
  species='robot')
response <- fromJSON(getURL(
  'http://localhost:5984/testing123/',
  customrequest='POST',
  httpheader=c('Content-Type'='application/json'),
  postfields=toJSON(bender)))
response
$ok
[1] TRUE

$id
[1] "2700b1428455d2d822f855e5fc0013fb"

$rev
[1] "1-d6ab7a690acd3204e0839e1aac01ec7a"

DELETE

For DELETE, you pass the doc's revision number in the query string. Sorry, Bender.

response <- fromJSON(getURL("http://localhost:5984/testing123/2700b1428455d2d822f855e5fc0013fb?rev=1-d6ab7a690acd3204e0839e1aac01ec7a",
  customrequest="DELETE"))

CRUD with R4CouchDB

R4CouchDB provides a layer on top of the techniques we've just described.

R4CouchDB uses a slightly strange idiom. You pass a cdb object, really just a list of parameters, into every R4CouchDB call and every call returns that object again, maybe modified. Results are returned in cdb$res. Maybe, they did this because R uses pass by value. Here's how you would initialize the object.

cdb <- cdbIni()
cdb$serverName <- "localhost"
cdb$port <- 5984
cdb$DBName="testing123"

Create

fake.data <- list(
  state='WA',
  population=6664195,
  state.bird='Lady GaGa')
cdb$dataList <- fake.data
cdb$id <- 'fake.data'  ## optional, otherwise an ID is generated
cdb <- cdbAddDoc(cdb)

cdb$res
$ok
[1] TRUE

$id
[1] "fake.data"

$rev
[1] "1-14bc025a194e310e79ac20127507185f"

Read

cdb$id <- 'bozo'
cdb <- cdbGetDoc(cdb)

bozo <- cdb$res
bozo
$`_id`
[1] "bozo"
... etc.

Update

First we take the document id and rev from the existing document. Then, save our revised document back to the DB.

cdb$id <- bozo$`_id`
cdb$rev <- bozo$`_rev`
bozo = list(
  name="Bozo",
  occupation="assassin",
  shoe.size=100,
  skills=c(
    'pranks',
    'honking nose',
    'kung fu',
    'high explosives',
    'sniper',
    'lock picking',
    'safe cracking'))
cdb <- cdbUpdateDoc(bozo)

Delete

Shortly thereafter, Bozo mysteriously disappeared.

cdb$id = bozo$`_id`
cdb <- cdbDeleteDoc(cdb)

More on ReST and CouchDB

  • One issue you'll probably run into is that unfortunately JSON left out NaN and Infinity. And, of course only R knows about NAs.
  • One-off ReST calls are easy using curl from the command line, as described in REST-esting with cURL.
  • I flailed about quite a bit trying to figure out the best way to do HTTP with R.
  • I originally thought R4CouchDB was part of a Google summer of code project to support NoSQL DBs in R. Dirk Eddelbuettel clarified that R4CouchDB was developed independently. In any case, the schema-less approach fits nicely with R's philosophy of exploratory data analysis.

To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)