Reading huge files into R

February 19, 2012
By

(This article was first published on The Data Monkey, and kindly contributed to R-bloggers)

SAS is much touted for its ability to read in huge datasets, and rightly so. However, that ability comes at a cost: for smaller datasets, since files remain on the disk rather than in memory (as is the case with Stata and R), it is potentially less fast.

If you don't want to learn/buy SAS but you have some large files you need to cut down to size (rather like the gorilla in the corner), R has several packages which can help. In particular, sqldf and ff both have methods to read in large CSV files. More advice is available here: http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r

If you're a Stata person, you can often get by reading the .dta file in chunks within a loop by adding e.g. "in 1/1000" afterwards if you want to read the 1st through 1000th observation in.

To leave a comment for the author, please follow the link and comment on his blog: The Data Monkey.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.