Rmongodb 1.8.0

November 1, 2014

(This article was first published on Data Science Notes - R, and kindly contributed to R-bloggers)

Today I’m introducing new version of rmongodb (which I started to maintain) – v1.8.0. Install it from github:

install_github("mongosoup/[email protected]")

Release version will be uploaded to CRAN shortly.
This release brings a lot of improvements to rmongodb:

  1. Now rmongodb correctly handles arrays.
    • mongo.bson.to.list() rewritten from scratch. R’s unnamed lists are treated as arrays, named lists as objects. Also it has an option – whether to try to simplify vanilla lists to arrays or not.
    • mongo.bson.from.list() updated.
  2. mongo.cursor.to.list() rewritten and has slightly changed behavior – it doesn’t produce any type coercions while fetching data from cursor.
  3. mongo.aggregation() has new options to match MongoDB 2.6+ features. Also second argument now called pipeline (as it is called in MongoDB command).
  4. new function mongo.index.TTLcreate() – creating indexes with “time to live” property.
  5. R’s NA values now converted into MongoDB null values.
  6. many bug fixes (including troubles with installation on Windows) – see full list

I want to highlight some of changes.
The first most important is that now rmongodb correctly handles arrays. This issue was very annoying for many users (including me :-). Moreover about half of rmongodb related questions at stackoverflow were caused by this issue. In new version of package, mongo.bson.to.list() is rewritten from scratch and mongo.bson.from.list() fixed. I heavily tested new behaviour and all works very smooth. Still it’s quite big internal change, because these fucntions are workhorses for many other high-level rmongodb functions. Please test it, your feedback is very wellcome. For example here is convertion of complex JSON into BSON using mongo.bson.from.JSON() (which internally call mongo.bson.from.list()):

json_string <- '{"_id": "dummyID", "arr":["string",3.14,[1,"2",[3],{"four":4}],{"mol":42}]}'
bson <- mongo.bson.from.JSON (json_string)

This will produce following MongoDB document:

{"_id": "dummyID", "arr":["string",3.14,[1,"2",[3],{"four":4}],{"mol":42}]}

The second one is that mongo.cursor.to.list() has new behaviour: it returns plain list of objects without any coercion. Each element of list corresponds to a document of underlying query result. Additional improvement is that mongo.cursor.to.list() uses R’s environments to avoid extra copying, so now it is much more efficient than previous version (especially when fetching a lot of records from MongoDB).

In the next few releases I have plans to upgrade underlying mongo-c-driver-legacy to latest version 0.8.1.

To leave a comment for the author, please follow the link and comment on their blog: Data Science Notes - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)