Opening Large CSV Files in R

December 26, 2012

(This article was first published on NERD PROJECT » R project posts, and kindly contributed to R-bloggers)

Before heading home for the holidays, I had a large data set (1.6 GB with over 1.25 million rows) with columns of text and integers ripped out of the company (Kwelia) Database and put into a .csv file since I was going to be offline a lot over the break. I tried opening the csv file in the usual way:

all <- read.csv("file.csv")

However it never finished even after letting it go all night. I also tried reading it into a SQLlite database first and reading it out of that, but the file was so messy it kept coming back with errors. I finally got it read in to R by using the ff package and the following code:

x<- read.csv.ffdf(file="file.csv", header=TRUE, VERBOSE=TRUE, first.rows=10000, next.rows=50000, colClasses=NA)

Because the file was so messy, I had to turn off column classes (colClasses=NA) to have the read ignore giving each column a class on the first 10,000. After reading the first 10,000 rows, the script then reads in chunks of 50,000 so as to not completely overload the ram in my laptop. I also turned Verbose because it would drive me nuts to not be able to follow the progress.

To leave a comment for the author, please follow the link and comment on their blog: NERD PROJECT » R project posts. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)