Merge all files in a directory using R into a single dataframe
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post, I provide a simple script for merging a set of files in a directory into a single, large dataset. I recently needed to do this, and it’s very straightforward.
Set the Directory
Begin by setting the current working directory to the one containing all the files that need to be merged:
setwd("target_dir/")
Getting a List of Files in a Directory
Next, it’s just a case of getting a list of the files in the directory. For this, the list.files() function can be used. As I haven’t specified any target directory to list.files(), it just lists the files in the current working directory.
file_list <- list.files()
If you want it to list the files in a different directory, just specify the path to list.files. For example, if you want the files in the folder C:/foo/, you could use the following code:
file_list <- list.files("C:/foo/")
Merging the Files into a Single Dataframe
The final step is to iterate through the list of files in the current working directory and put them together to form a dataframe.
When the script encounters the first file in the file_list, it creates the main dataframe to merge everything into (called dataset here). This is done using the !exists conditional:
- If dataset already exists, then a temporary dataframe called temp_dataset is created and added to dataset. The temporary dataframe is removed when we’re done with it using the rm(temp_dataset) command.
- If dataset doesn’t exist (!exists is true), then we create it.
Here’s the remainder of the code:
for (file in file_list){ # if the merged dataset doesn't exist, create it if (!exists("dataset")){ dataset <- read.table(file, header=TRUE, sep="\t") } # if the merged dataset does exist, append to it if (exists("dataset")){ temp_dataset <-read.table(file, header=TRUE, sep="\t") dataset<-rbind(dataset, temp_dataset) rm(temp_dataset) } }
The Full Code
Here’s the code in it’s entirety, put together for ease of pasting. I assume there are more efficient ways to do this, but it hasn’t taken long to merge 45 text files totalling about 400MB with some 300,000 rows and 300 columns.
setwd("target_dir/") file_list <- list.files() for (file in file_list){ # if the merged dataset doesn't exist, create it if (!exists("dataset")){ dataset <- read.table(file, header=TRUE, sep="\t") } # if the merged dataset does exist, append to it if (exists("dataset")){ temp_dataset <-read.table(file, header=TRUE, sep="\t") dataset<-rbind(dataset, temp_dataset) rm(temp_dataset) } }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.