Remotely deleting files from R

[This article was first published on Amy Whitehead's Research » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Sometimes programs generate a LOT of files while running scripts. Usually these are important (why else would you be running the script?). However, sometimes scripts generate mountains of temporary files to create summary outputs that aren’t really useful in their own right. Manually deleting such temporary files can be a very time consuming and tedious process, particularly if they are mixed in with the important ones. Not to mention the risk of accidentally deleting things you need because you’ve got bored and your mind has wandered off to more exciting things… watching orca swim past from the hut window

…like watching orca swim past the hut window!

I had exactly this problem a few months ago when I had ~65,000 temp files from a modelling process that were no longer needed, inconveniently mixed in with the things I needed to keep. Clearly deleting these files manually wasn’t really going to be an option. There are a number of ways to tackle this problem but R provided a simple two-line solution to the problem.

The first step is to identify if there are any patterns in the file names that will help you remove only the files that you want to delete (and not the really important ones!). Then construct a regular expression that matches the pattern. A handy reference guide to regular expressions can be found here. In my case, all the file names to delete contained a text string followed by this ".xxxx" pattern, where x is a number (i.e. Iamafiletodelete.1234.csv). Therefore, my regex pattern looked like this: ".[0-9]" (but see note below)*.

Then we can simply point R at the appropriate folder using dir(), identify the list of offending files, and delete them using file.remove(). Note that this has the potential to go horribly wrong if you aren’t careful! Make sure that you check very carefully that the pattern recognition selects only those files that you want to delete before you delete anything! This will result in a permanent delete (i.e. no rescuing things back from the recycle bin) and cannot be undone! <- dir("C:/the folder I want to delete from/",pattern=".[0-9]",recursive=T,full.names=T)

You can also see if the files exist either before or after you delete them as a useful check to make sure it worked.

Go forth and delete things but use at your own peril!

*As Patrick pointed out in the comments, the way that I have written the regex pattern technically isn’t correct. While it worked, it could also have gone horribly wrong! A good example of why checking the selected file strings before you actually delete the files is a very good idea. Patrick’s suggestion for the correct pattern is “.\d{4}.". He also points out that you can test regex code at Rubular, which seems like a very good idea!

Related posts 

To leave a comment for the author, please follow the link and comment on their blog: Amy Whitehead's Research » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)