(This article was first published on

**Yet Another Blog in Statistical Computing » S+/R**, and kindly contributed to R-bloggers)Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.

> library(RSQLite) Loading required package: DBI > library(rhdf5) > df <- read.csv('credit_count.csv') > do.call(cat, list(nrow(df), ncol(df), '\n')) 13444 14 > > # WRITE DF INTO SQLITE > if(file.exists('data.db')) file.remove('data.db') [1] TRUE > con <- dbConnect("SQLite", dbname = "data.db") > dbWriteTable(con, "tbl", df) [1] TRUE > > # WRITE DF INTO HDF5 > if(file.exists('data.h5')) file.remove('data.h5') [1] TRUE > h5createFile("data.h5") [1] TRUE > h5write(df, 'data.h5', 'tbl') > > # CALCULATE CPU TIMES > system.time(for(i in 1:10) read.csv('credit_count.csv')) user system elapsed 1.148 0.056 1.576 > system.time(for(i in 1:10) dbReadTable(con, 'tbl')) user system elapsed 0.492 0.024 0.649 > system.time(for(i in 1:10) h5read('data.h5','tbl')) user system elapsed 0.164 1.184 1.946

To

**leave a comment**for the author, please follow the link and comment on their blog:**Yet Another Blog in Statistical Computing » S+/R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...