Merge all files in a directory using R into a single dataframe

June 3, 2011
By

(This article was first published on Psychwire » R, and kindly contributed to R-bloggers)

In this post, I provide a simple script for merging a set of files in a directory into a single, large dataset. I recently needed to do this, and it’s very straightforward.

Set the Directory

Begin by setting the current working directory to the one containing all the files that need to be merged:

setwd("target_dir/")

Getting a List of Files in a Directory

Next, it’s just a case of getting a list of the files in the directory. For this, the list.files() function can be used. As I haven’t specified any target directory to list.files(), it just lists the files in the current working directory.

file_list <- list.files()

If you want it to list the files in a different directory, just specify the path to list.files. For example, if you want the files in the folder C:/foo/, you could use the following code:

file_list <- list.files("C:/foo/")

Merging the Files into a Single Dataframe

The final step is to iterate through the list of files in the current working directory and put them together to form a dataframe.

When the script encounters the first file in the file_list, it creates the main dataframe to merge everything into (called dataset here). This is done using the !exists conditional:

  • If dataset already exists, then a temporary dataframe called temp_dataset is created and added to dataset. The temporary dataframe is removed when we’re done with it using the rm(temp_dataset) command.
  • If dataset doesn’t exist (!exists is true), then we create it.

Here’s the remainder of the code:

for (file in file_list){
      
  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }
  
  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

The Full Code

Here’s the code in it’s entirety, put together for ease of pasting. I assume there are more efficient ways to do this, but it hasn’t taken long to merge 45 text files totalling about 400MB with some 300,000 rows and 300 columns.

setwd("target_dir/")

file_list <- list.files()

for (file in file_list){
      
  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }
  
  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

To leave a comment for the author, please follow the link and comment on their blog: Psychwire » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)