How to import a subset of a (too huge) csv file

February 6, 2017

(This article was first published on R – Inside data, and kindly contributed to R-bloggers)


Sometimes people create csv files that are just too huge to upload them into your R session while most of the times you just need a subset of this data set. Recently I tapped into this problem and first I tried to import the whole file with functions like fread or the classic read.csv but this did not help much as the file was just too big and my computer failed to import it. With the awesome read.csv.sql function from the sqldf package I found a good way to solve my problem. This function enables you to use SQL statements within the import function which make it possible to select only a subset of the file to reduce the import size.

mydata = read.csv.sql("mydata.csv", sql = "select * from file where City = '\"Hamburg\"' ", sep = ";")

The code above loads only those lines of the file in where the city is Hamburg. I still had trouble to with the encoding that is why I used this ugly string with backlashes in the SQL statement. I will leave it like this as you might having the same problem.

Make sure you check out my Github for other data driven projects.

To leave a comment for the author, please follow the link and comment on their blog: R – Inside data. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)