Stacking files made easy

[This article was first published on » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Every campaign cycle I usually do similar things, go to a repository, download a bounce of data, merge and store them to an existing RData file for posterior analysis. I’ve already wrote about this topic some time ago, but this time I think my script became simpler.

Set the Directory

Let’s assume you’re not in the same directory of your files, so you’ll need to set R to where the population of files resides.

setwd("~/Downloads/consulta_cand_2014")

Getting a List of files

Next, it’s just a matter of getting to know your files. For this, the list.files() function is very handy, and you can see the file names right-way in your screen. Here I’m looking form those “txt” files, so I want my list of files exclude everything else, like pdf, jpg etc.

files <- list.files(pattern= '\.txt$')

Sometimes you may find empty objects that may prevent the script to run successfully against them. Thus, you may want to inspect the files beforehand.

info = file.info(files)
empty = rownames(info[info$size == 0, ])

Moreover, in case you have the same files in more than one format, you may want to filter them like in the following:

CSVs <-list.files(pattern='csv')
TXTs <- list.files(pattern='txt')
mylist <- CSVs[!CSVs %in% TXTs]

Stacking files into a dataframe

The last step is to iterate "rbind" through the list of files in the working directory putting all them together.
Notice that in the script below I've included some extra conditions to avoid problems reading the files I have. Also, this assumes all the files have the same number of columns, otherwise "rbind" won't work. In this case you may need to replace "rbind" by "smartbind" from gtools package.

cand_br <- do.call("rbind",lapply(files,
FUN=function(files){read.table(files,
header=FALSE, sep=";",stringsAsFactors=FALSE, 
fileEncoding="cp1252", fill=TRUE,blank.lines.skip=TRUE)
}))

To leave a comment for the author, please follow the link and comment on their blog: » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)