R and MongoDB

June 7, 2013

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

MongoDB is a document-based noSQL database. Different from the relational database storing data in tables with rigid schemas, MongoDB stores data in documents with dynamic schemas. In the demonstration below, I am going to show how to extract data from a MongoDB with R.

Before starting the R session, we need to install the MongoDB in the local machine and then load the data into the database with the Python code below.

import pandas as pandas
import pymongo as pymongo

df = pandas.read_table('../data/csdata.txt')
lst = [dict([(colname, row[i]) for i, colname in enumerate(df.columns)]) for row in df.values]
for i in range(3):
  print lst[i]

con = pymongo.Connection('localhost', port = 27017)
test = con.db.test
for i in lst:

To the best of my knowledge, there are two R packages providing the interface with MongoDB, namely RMongo and rmongodb. While RMongo package is very straight-forward and user-friendly, it did take me a while to figure out how to specify a query with rmongodb package.

RMongo Example

mg1 <- mongoDbConnect('db')
query <- dbGetQuery(mg1, 'test', "{'AGE': {'$lt': 10}, 'LIQ': {'$gte': 0.1}, 'IND5A': {'$ne': 1}}")
data1 <- query[c('AGE', 'LIQ', 'IND5A')]

RMongo Output

Loading required package: rJava
Loading required package: methods
Loading required package: RUnit
[1] "system.indexes" "test"          
      AGE             LIQ             IND5A  
 Min.   :6.000   Min.   :0.1000   Min.   :0  
 1st Qu.:7.000   1st Qu.:0.1831   1st Qu.:0  
 Median :8.000   Median :0.2970   Median :0  
 Mean   :7.963   Mean   :0.3745   Mean   :0  
 3rd Qu.:9.000   3rd Qu.:0.4900   3rd Qu.:0  
 Max.   :9.000   Max.   :1.0000   Max.   :0  

rmongodb Example

mg2 <- mongo.create()
print(mongo.get.database.collections(mg2, 'db'))
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, 'AGE')
mongo.bson.buffer.append(buf, '$lt', 10)
mongo.bson.buffer.start.object(buf, 'LIQ')
mongo.bson.buffer.append(buf, '$gte', 0.1)
mongo.bson.buffer.start.object(buf, 'IND5A')
mongo.bson.buffer.append(buf, '$ne', 1)
query <- mongo.bson.from.buffer(buf)
cur <- mongo.find(mg2, 'db.test', query = query)
age <- liq <- ind5a <- NULL
while (mongo.cursor.next(cur)) {
  value <- mongo.cursor.value(cur)
  age   <- rbind(age, mongo.bson.value(value, 'AGE'))
  liq   <- rbind(liq, mongo.bson.value(value, 'LIQ'))
  ind5a <- rbind(ind5a, mongo.bson.value(value, 'IND5A'))
data2 <- data.frame(AGE = age, LIQ = liq, IND5A = ind5a)

rmongo Output

rmongodb package (mongo-r-driver) loaded
Use 'help("mongo")' to get started.

[1] "db"
[1] "db.test"
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
      AGE             LIQ             IND5A  
 Min.   :6.000   Min.   :0.1000   Min.   :0  
 1st Qu.:7.000   1st Qu.:0.1831   1st Qu.:0  
 Median :8.000   Median :0.2970   Median :0  
 Mean   :7.963   Mean   :0.3745   Mean   :0  
 3rd Qu.:9.000   3rd Qu.:0.4900   3rd Qu.:0  
 Max.   :9.000   Max.   :1.0000   Max.   :0  

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)