R and MongoDB
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
MongoDB is a document-based noSQL database. Different from the relational database storing data in tables with rigid schemas, MongoDB stores data in documents with dynamic schemas. In the demonstration below, I am going to show how to extract data from a MongoDB with R.
Before starting the R session, we need to install the MongoDB in the local machine and then load the data into the database with the Python code below.
import pandas as pandas
import pymongo as pymongo
df = pandas.read_table('../data/csdata.txt')
lst = [dict([(colname, row[i]) for i, colname in enumerate(df.columns)]) for row in df.values]
for i in range(3):
print lst[i]
con = pymongo.Connection('localhost', port = 27017)
test = con.db.test
test.drop()
for i in lst:
test.save(i)
To the best of my knowledge, there are two R packages providing the interface with MongoDB, namely RMongo and rmongodb. While RMongo package is very straight-forward and user-friendly, it did take me a while to figure out how to specify a query with rmongodb package.
RMongo Example
library(RMongo)
mg1 <- mongoDbConnect('db')
print(dbShowCollections(mg1))
query <- dbGetQuery(mg1, 'test', "{'AGE': {'$lt': 10}, 'LIQ': {'$gte': 0.1}, 'IND5A': {'$ne': 1}}")
data1 <- query[c('AGE', 'LIQ', 'IND5A')]
summary(data1)
RMongo Output
Loading required package: rJava
Loading required package: methods
Loading required package: RUnit
[1] "system.indexes" "test"
AGE LIQ IND5A
Min. :6.000 Min. :0.1000 Min. :0
1st Qu.:7.000 1st Qu.:0.1831 1st Qu.:0
Median :8.000 Median :0.2970 Median :0
Mean :7.963 Mean :0.3745 Mean :0
3rd Qu.:9.000 3rd Qu.:0.4900 3rd Qu.:0
Max. :9.000 Max. :1.0000 Max. :0
rmongodb Example
library(rmongodb)
mg2 <- mongo.create()
print(mongo.get.databases(mg2))
print(mongo.get.database.collections(mg2, 'db'))
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, 'AGE')
mongo.bson.buffer.append(buf, '$lt', 10)
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.start.object(buf, 'LIQ')
mongo.bson.buffer.append(buf, '$gte', 0.1)
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.start.object(buf, 'IND5A')
mongo.bson.buffer.append(buf, '$ne', 1)
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
cur <- mongo.find(mg2, 'db.test', query = query)
age <- liq <- ind5a <- NULL
while (mongo.cursor.next(cur)) {
value <- mongo.cursor.value(cur)
age <- rbind(age, mongo.bson.value(value, 'AGE'))
liq <- rbind(liq, mongo.bson.value(value, 'LIQ'))
ind5a <- rbind(ind5a, mongo.bson.value(value, 'IND5A'))
}
mongo.destroy(mg2)
data2 <- data.frame(AGE = age, LIQ = liq, IND5A = ind5a)
summary(data2)
rmongo Output
rmongodb package (mongo-r-driver) loaded
Use 'help("mongo")' to get started.
[1] "db"
[1] "db.test"
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
NULL
AGE LIQ IND5A
Min. :6.000 Min. :0.1000 Min. :0
1st Qu.:7.000 1st Qu.:0.1831 1st Qu.:0
Median :8.000 Median :0.2970 Median :0
Mean :7.963 Mean :0.3745 Mean :0
3rd Qu.:9.000 3rd Qu.:0.4900 3rd Qu.:0
Max. :9.000 Max. :1.0000 Max. :0
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.