Data Import Efficiency – A Case in R
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.
> library(RSQLite) Loading required package: DBI > library(rhdf5) > df <- read.csv('credit_count.csv') > do.call(cat, list(nrow(df), ncol(df), '\n')) 13444 14 > > # WRITE DF INTO SQLITE > if(file.exists('data.db')) file.remove('data.db') [1] TRUE > con <- dbConnect("SQLite", dbname = "data.db") > dbWriteTable(con, "tbl", df) [1] TRUE > > # WRITE DF INTO HDF5 > if(file.exists('data.h5')) file.remove('data.h5') [1] TRUE > h5createFile("data.h5") [1] TRUE > h5write(df, 'data.h5', 'tbl') > > # CALCULATE CPU TIMES > system.time(for(i in 1:10) read.csv('credit_count.csv')) user system elapsed 1.148 0.056 1.576 > system.time(for(i in 1:10) dbReadTable(con, 'tbl')) user system elapsed 0.492 0.024 0.649 > system.time(for(i in 1:10) h5read('data.h5','tbl')) user system elapsed 0.164 1.184 1.946
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.