Looping Through Files

Posted on January 2, 2014 by Daniel MarcelinoDaniel Marcelino » R in R bloggers | 0 Comments

[This article was first published on Daniel MarcelinoDaniel Marcelino » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today, I finally got inspired to deal with tons of datasets from the Tribunal Superior Eleitoral on the Brazilian elections. The cause of the delay for putting my finger on them was simply to avoid troubles with messy large text files. The set of data I collect consists of above 40GB of pure text files, which reports electoral results, candidates’ profile, campaign revenues and expenditures etc. Therefore, if anything it may be a good example of using R for data management, and that it might be useful for students while dealing with messy datasets from everywhere.

The task can be stated as follows. Suppose you have a set of data files (data1.txt, data2.txt, […] ,data27.txt) which represents some data–or a subset data–sliced by states or electoral districts. What you want to do is simply stack every data file into a beautiful unique file for more aggregated analyses, or just releasing the computer from storing too many sliced data. In sum, the task is to obtain a table of all subsets; more complex cases will be addressed on later posts. This can be done by browsing to the directory where the files are, then looping through them importing and merging. Finally, the aggregated file can be written back to the disk.

The piece of code below does just that. The first line paste the path where R must look at for the files. The second line creates an empty data object to store each of the importing files if any. The third line reads the path to the files, and then a loop for reading each existing file of type “.txt” as table. The last line in the loop creates the final table by appending each subset that was imported into memory. Finally, the last part of the program, which is out of the loop for efficiency purpose, simply write the final table to the disk as a text file,delimiting the columns by semicolon ‘;’.

To leave a comment for the author, please follow the link and comment on their blog: Daniel MarcelinoDaniel Marcelino » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Looping Through Files

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)