Get your large SQL data in ff swiftly

[This article was first published on BNOSAC - Belgium Network of Open Source Analytical Consultants, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The ff package is great when you are working with large data in R. Data in corporate environments are usually not that large that a Hadoop system is needed to handle it but the data are mostly large enough to make R choke on it’s RAM. 

The ff package is great for this type of data. It can handle 2.14 billion elements per atomic (so 2.14 billion records). This is mostly more than enough in standard business settings nowadays.

For a while getting the data in R was quite cumbersome. The ff package now provides already nice interfaces to csv and flat files in the read.csv.ffdf and read.table.ffdf functions. To extend this towards standard SQL databases, the ETLUtils package has been put at CRAN. It contains a function called read.dbi.ffdf which can be used to extract easily data from any sql database through DBI directly into an ffdf in R.

Below, we show an example of importing a table stored at a MySQL database with 11.8 million records directly in an ffdf without having RAM issues.

etlutils

To leave a comment for the author, please follow the link and comment on their blog: BNOSAC - Belgium Network of Open Source Analytical Consultants.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)