R function for reading big tables

[This article was first published on Recipes, scripts and genomics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

HugeFileLoader = function(path, sep = “\t”, skip = 0, header = T, nrows = 10){

### counts the number of lines using shell wc command, and converts the output to numeric
line.count = paste(“wc -l “, path, sep = “”)
row.count = as.numeric(strsplit(system(line.count, intern = T), split=” “)[[1]][1]) – skip

### reads in first five lines of the file and determines the type of each column
first5rows = read.table(path, header = TRUE, nrows = nrows, skip = skip, sep = sep)
tab.classes = sapply(first5rows, class)

### reads in the data
tab = read.table(path, header=header, colClasses=tab.classes, comment.char=”#”, nrows=row.count, skip=skip, sep=sep)
return(tab)
}

If you are using R on a Mac, you have to change the index when parsing wc -l output ([[1]][1]), because it returns a space as the first character, while on a linux machine it returns the number of lines.

To leave a comment for the author, please follow the link and comment on their blog: Recipes, scripts and genomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)