# Read Big Text Files Column by Column

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Dear R Programmers,**Econometrics_Help**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There is new package “colbycol” on CRAN, which makes our jobs easier when we have large files i.e. more than a GB to be read in R. Especially, when we don’t need all of the columns/variables for our analysis. Kudos for author, Carlos J. Gil Bellosta.

I have tried it on a 1.72 GB data, where in my main interest was “few columns” where it has more 300 columns and 500,000 rows. Since, it is easy to know about how many columns exist by reading few lines of data (also refer to my earlier post http://costaleconomist.blogspot.in/2010/02/easy-way-of-determining-number-of.html and ?readLines), R job of getting what I want was completed with few lines as below (and also in quicker time):

library(colbycol)

cbc.data.7.cols <- cbc.read.table("D:/XYZ/filename.csv", just.read = c(1, 3, 21, 34, 108, 205, 227), sep = ",")

nrow(cbc.data.7.cols)

colnames(cbc.data.7.cols)

# then on can convert simply to data.frame as follows

train.data <- as.data.frame(cbc.data.7.cols, columns = 1:7, rows = 1:50000)

Also, refer to http://colbycol.r-forge.r-project.org/ for quick intro by author.

Have a nice programming with R.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Econometrics_Help**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.