Reading multiple files.

[This article was first published on R – FordoX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By now, we all are familiar with reading csv file into R. But, what if there is a block of operations that we need to perform on multiple files? I think that will be a quite tiring job to include each csv every time and run the script.

The best and the easiest way will be to automate the whole process for which we need to design a Rscript.

Step 1:  We begin by listing all the files in my working directory. We have specified the file format by mentioning “.csv ” as pattern.

file_list <- list.files(pattern="*.csv")

Step 2:  After listing, it’s time to find the number of csv files in the directory.

l <- length(file_list)

Step 3: Now, by running a loop, we can access the content of each csv file.

for (i in 1:l) {
  x <- read.csv(temp[i])
}

Yeah! by now we can read the contents of all the files automatically by running the  script.

Now, if you have the csv files with different number of columns and you want to work with specific columns of all the csv files, but the column number of that column is different in different csv file, it will be a quite difficult situation to handle.

Say, for an example, I have three files names “A.csv”, “B.csv” and “C.csv” and I want to work with “Entropy” Column of all the csv files, but it occurs as 3rd column in “A.csv”, 5th column in “B.csv” and and 9th column in “C.csv”. As there is no uniformity in the column number, it cannot be accessed dynamically as desired. This will be a great fallback in automating the process. So, what I would do is:

## checking if the name of the column is "Entropy"
if(collnames(x)[j]=="Entropy") {

  ## saving the original column name for future use
  y[j]<-colnames(x)[j]

  ## changing the name of the jth column
  colnames(x)[j]<-'test'

  ## accessing the column by it's name
  ent [q]<-entropy(table(x$test))

  ## again assigning the original column name to the jth column
  colnames(x)[j]<-y[j]
}

So, finally my RScript looks like this :

file_list <- list.files(pattern="*.csv")
l<-length(file_list)

for (i in 1:l) {
    
  x <- read.csv(temp[i])
  y <- names(x)

for( j in 1:ncol(x)) {

  if(collnames(x)[j]=="Entropy") {
     y[j]<-colnames(x)[j]
     csv<-c(csv,temp[i])
     colnames(x)[j]<-'test'
     ent [q]<-entropy(table(x$test))
     colnames(x)[j]<-y[j]
  }
  q<-q+1
}

df <- data.frame(csv=character(), entropy=character() , stringsAsFactors=FALSE)
df <- cbind(csv,attribute)

Hope this helps and saves lots of time and effort. Happy Mining!


To leave a comment for the author, please follow the link and comment on their blog: R – FordoX.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)