Perform a Function on Each File in R

Posted on September 26, 2013 by Mollie in R bloggers | 0 Comments

[This article was first published on Mollie's Research Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Sometimes you might have several data files and want to use R to perform the same function across all of them. Or maybe you have multiple files and want to systematically combine them into one file without having to open each file and manually copy the data out.

Fortunately, it’s not complicated to use R to systematically iterate across files.

Finding or Choosing the Names of Data Files

There are multiple ways to find or choose the names of the files you want to analyze.

You can explicitly state the file names or you can get R to find any files with a particular extension.

Explicitly Stating File Names

fileNames <- c("sample1.csv", "sample2.csv")

Finding Files with a Specific Extension

In this case, we use Sys.glob from the base package to find all files including the wildcard "*.csv".

fileNames <- Sys.glob("*.csv")

Iterating Across All Files

We'll start with a loop and then we can add whatever functions we want to the inside of the loop:

for (fileName in fileNames) {

  # read data:
  sample <- read.csv(fileName,
    header = TRUE,
    sep = ",")

  # add more stuff here

}

For example, we could add one to every "Widget" value in each file and overwrite the old data with the new data:

for (fileName in fileNames) {

  # read old data:
  sample <- read.csv(fileName,
    header = TRUE,
    sep = ",")

  # add one to every widget value in every file:
  sample$Widgets <- sample$Widgets + 1
  
  # overwrite old data with new data:
  write.table(sample, 
    fileName,
    append = FALSE,
    quote = FALSE,
    sep = ",",
    row.names = FALSE,
    col.names = TRUE)

}

Or we could do the same thing, but create a new copy of each file:

extension <- "csv"

fileNames <- Sys.glob(paste("*.", extension, sep = ""))

fileNumbers <- seq(fileNames)

for (fileNumber in fileNumbers) {

  newFileName <-  paste("new-", 
    sub(paste("\\.", extension, sep = ""), "", fileNames[fileNumber]), 
    ".", extension, sep = "")

  # read old data:
  sample <- read.csv(fileNames[fileNumber],
    header = TRUE,
    sep = ",")

  # add one to every widget value in every file:
  sample$Widgets <- sample$Widgets + 1
  
  # write old data to new files:
  write.table(sample, 
    newFileName,
    append = FALSE,
    quote = FALSE,
    sep = ",",
    row.names = FALSE,
    col.names = TRUE)

}

In the above example, we used the paste and sub functions from the base package to automatically create new file names based on the old file names.

Or we could instead use each dataset to create an entirely new dataset, where each row is based on data from one file:

fileNames <- Sys.glob("*.csv")

for (fileName in fileNames) {

  # read original data:
  sample <- read.csv(fileName,
    header = TRUE,
    sep = ",")

  # create new data based on contents of original file:
  allWidgets <- data.frame(
    File = fileName,
    Widgets = sum(sample$Widgets))
  
  # write new data to separate file:
  write.table(allWidgets, 
    "Output/sample-allSamples.csv",
    append = TRUE,
    sep = ",",
    row.names = FALSE,
    col.names = FALSE)

}

In the above example, data.frame is used to create a new data row based on each data file. Then the append option of write.table is set to TRUE so that row can be added to the other rows created from other data files.

Those are just a few examples of how you can use R to perform the same function(s) on a large number of files without having to manually run each one. I'm sure you can think of more uses.

All the files are available on GitHub. You can see how eachFile.R, eachfile-newNames.R, and eachFile-append.R each do something different to the sample datasets.

To leave a comment for the author, please follow the link and comment on their blog: Mollie's Research Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Perform a Function on Each File in R

Finding or Choosing the Names of Data Files

Explicitly Stating File Names

Finding Files with a Specific Extension

Iterating Across All Files

Related

Finding or Choosing the Names of Data Files

Explicitly Stating File Names

Finding Files with a Specific Extension

Iterating Across All Files

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)