Speed up your R scripts. A cool optimized way to load, write and store big data frames with FST package!

Posted on March 29, 2020 by R | TypeThePipe in R bloggers | 0 Comments

[This article was first published on R | TypeThePipe, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Are you trying to save and load your DL model or a big dataset in R? Here we show you a performance boost to your scripts and reduction in disk memory storage with the FST CRAN package. We are going to benchmark it with R base functions (csv and RDS extensions) and another great package like readr:

library(tidyverse)
big_dataset %>% nrow() # 700k rows, 15 cols(8 factor, 4int, 3 logi)
library(microbenchmark)
library(readr)
library(fst)

microbenchmark(
 write.csv(big_dataset, paste0(path,"big_dataset.csv"),), # utils
 write_csv(big_dataset, paste0(path,"big_dataset.csv")), # readr
 write_csv(big_dataset, paste0(path,"big_dataset.csv.gz"),), # readr GZ
 saveRDS(big_dataset, paste0(path,"big_dataset.RDS")), # utils
 write_rds(big_dataset, paste0(path,"big_dataset.RDS")), # readr
 write_fst(big_dataset, paste0(path,"big_dataset.fst")), # fst
 times = 10
)
## Unit: milliseconds
## min mean median max neval file_size
##utils 10943.1161 11232.20073 11098.66610 12011.1538 10 109 MB
##readr 3140.4450 3442.92772 3388.14280 3768.4109 10 109 MB
##readrGZ 6993.8850 7332.31976 7260.95040 7946.9233 10 23 MB
##base 4800.3516 5122.22345 5024.69395 5833.9807 10 15 MB
##readr 187.0765 210.74584 211.70760 246.6369 10 46 MB
"fst 60.3065 87.30611 74.94375 154.7718 10 16 MB"

Wow! That was cool! We can achieve an amazing reading and writing speed plus an incredible file size!

We can see a x3 and x50 performance improvements over the readr::write_rds() and base saveRDS() functions!

An incredible x100 performance between fst and csv writing functions, but the true here is that they are not directly comparable as they work with quite different file formats.

Are you going to add FST to your R projects toolbox too?

See related useful tips on TypeThePipe

To leave a comment for the author, please follow the link and comment on their blog: R | TypeThePipe.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Speed up your R scripts. A cool optimized way to load, write and store big data frames with FST package!

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)