Site icon R-bloggers

Operating on files with R: copy and rename

[This article was first published on Milano R net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Nowadays, routinary operations on files, such as renaming or copying, are performed with some mouse clicks. Sometimes, it is useful perform this operations in batch. Linux users perform this operations through the shell. Also Windows users can use the shell, but there are also a lot of utilities that simplify these operations.

Why someone should use R to copy or rename a (lot of) file(s)?

For an R user, R can be more intuitive than the operating system shell.

I found another good reason to use R for this operations: I need to operating on files as a preliminary step to my statistical analyses.

I received a lot of files (about 20000). Files were contained in a lot of directory structured like follow. Each directory refers to a day and contains some useless file, that I ignored, and a subdirectory with the txt files I need. The main directory has a name like “2012_09_21_Fri” while subdirectory has a name like “Fri 21 sep 2012”. So, I need to copy the relevant files in a directory like “2012-09-21”.

The first step is listing all directories I have. I saved both the full path and only the name of each directory in two different R vectors.

?View Code RSPLUS
fl = list.files(dirIn, full.names = TRUE)
dn = list.files(dirIn, full.names = FALSE)

At this point, in every directory (so I put the code below in a for cycle), I search the subdirectory (it is the first element of the directory) and I list all files contained in the subdirectory.

?View Code RSPLUS
dir = list.files(cfl, full.names = TRUE)[[1]]
flTxt = list.files(dir, full.names = TRUE)

Now, I need to create a new directory with a name like “2012-09-21”. As seen above, information about day, month and year are available in the directory name but they are not well structured. So, I can use paste() and substr() function to build the name. Please note, that cdn contain only one element from dn. For example, cdn = cd[index] where index is the counter of the loop.

?View Code RSPLUS
subdirName = paste0(substr(cdn, 11, 14), "-", substr(cdn, 05, 06), "-", substr(cdn, 08, 09))

Now, I can create my directory, using the dir.create() function:

?View Code RSPLUS
dir.create(subdirName)

Now, I need to copy all the txt files from their old subdirectories to the new directories I created above.

?View Code RSPLUS
file.copy(from = flTxt, to = subdirName)

Finally, also txt files name are difficult to interpret and I need to rename theses files. I list the files in the following way, removing the full path:

?View Code RSPLUS
oldNames = list.files(subdirName, full.names = TRUE)
oldNames = sapply(strsplit(oldNames, "/"), "[", 7)

And now, I can rename my files. newNames is a character vector containing the new file names. newNames is built similarly to subdirName.

?View Code RSPLUS
file.rename(from = file.path(subdirName, oldNames), to = file.path(subdirName, newNames))

To leave a comment for the author, please follow the link and comment on their blog: Milano R net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.