Reading multiple files.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By now, we all are familiar with reading csv file into R. But, what if there is a block of operations that we need to perform on multiple files? I think that will be a quite tiring job to include each csv every time and run the script.
The best and the easiest way will be to automate the whole process for which we need to design a Rscript.
Step 1: We begin by listing all the files in my working directory. We have specified the file format by mentioning “.csv ” as pattern.
file_list <- list.files(pattern="*.csv")
Step 2: After listing, it’s time to find the number of csv files in the directory.
l <- length(file_list)
Step 3: Now, by running a loop, we can access the content of each csv file.
for (i in 1:l) { x <- read.csv(temp[i]) }
Yeah! by now we can read the contents of all the files automatically by running the script.
Now, if you have the csv files with different number of columns and you want to work with specific columns of all the csv files, but the column number of that column is different in different csv file, it will be a quite difficult situation to handle.
Say, for an example, I have three files names “A.csv”, “B.csv” and “C.csv” and I want to work with “Entropy” Column of all the csv files, but it occurs as 3rd column in “A.csv”, 5th column in “B.csv” and and 9th column in “C.csv”. As there is no uniformity in the column number, it cannot be accessed dynamically as desired. This will be a great fallback in automating the process. So, what I would do is:
## checking if the name of the column is "Entropy" if(collnames(x)[j]=="Entropy") { ## saving the original column name for future use y[j]<-colnames(x)[j] ## changing the name of the jth column colnames(x)[j]<-'test' ## accessing the column by it's name ent [q]<-entropy(table(x$test)) ## again assigning the original column name to the jth column colnames(x)[j]<-y[j] }
So, finally my RScript looks like this :
file_list <- list.files(pattern="*.csv") l<-length(file_list) for (i in 1:l) { x <- read.csv(temp[i]) y <- names(x) for( j in 1:ncol(x)) { if(collnames(x)[j]=="Entropy") { y[j]<-colnames(x)[j] csv<-c(csv,temp[i]) colnames(x)[j]<-'test' ent [q]<-entropy(table(x$test)) colnames(x)[j]<-y[j] } q<-q+1 } df <- data.frame(csv=character(), entropy=character() , stringsAsFactors=FALSE) df <- cbind(csv,attribute)
Hope this helps and saves lots of time and effort. Happy Mining!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.