read.odbc.ffdf & read.dbi.ffdf for fetching large corporate SQL data

[This article was first published on BNOSAC - Belgium Network of Open Source Analytical Consultants, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you are into large data but not enormeoulsy big data everyone is talking about and you are tired of finding a solution to get your data with several 10's of millions of records in R without having RAM issues, having a look at the packages ff, ffbase and ETLUtils might be the solution to your problem.

Following up on our post about the ETLUtils package which eases transferring large data from SQL databases to ffdf objects in R, the ETLUtils package has now been extended to include the function read.odbc.ffdf which can be used to fetch your SQL queries on corporate Oracle, MySQL, PostgreSQL & sqlite databases. 
Below we show an example where read.dbi.ffdf is used to fetch all rows of a table and we add data of the same structure with read.odbc.ffdf to the existing ffdf. This might be of interest to you if you work a lot with dayly incremental data updates.
The query below returned +/- 15Mio records using read.dbi.ffdf without any RAM issues (on this PC I have 4Gb of RAM) and added another 100000 records as an example using read.odbc.ffdf. And all of the data is completely in an ffdf in R.
 
require(ETLUtils)
 
login <- list()
login$user <- "bnosac"
login$password <- "YourPassword"
login$dbname <- "YourDB"
login$host <- "localhost/IPaddress"
 
require(RMySQL)
x <- read.dbi.ffdf(
query = "select * from semetis.keywords_performance_endofday", 
dbConnect.args = list(drv = dbDriver("MySQL"), 
dbname = login$dbname, user = login$user, password = login$password, host = login$host), 
VERBOSE=TRUE)
1> dim(x)
[1] 14969674       27
 
login <- list()
login$dsn <- "YourDSN"
login$uid <- "bnosac"
login$pwd <- "YourPassword"
require(RODBC)
x <- read.odbc.ffdf(
query = "select * from semetis.keywords_performance_endofday where date = CURRENT_DATE-1", 
odbcConnect.args = list(dsn = login$dsn, uid = login$uid, pwd = login$pwd),
x = x,
VERBOSE=TRUE)
1> dim(x)
[1] 15062904       27
 

To leave a comment for the author, please follow the link and comment on their blog: BNOSAC - Belgium Network of Open Source Analytical Consultants.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)