A Python-Like walk() Function for R

April 8, 2017
By

(This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers)

A really nice function available in Python is walk(), which recursively descends a directory tree, calling a user-supplied function in each directory within the tree. It might be used, say, to count the number of files, or maybe to remove all small files and so on. I had students in my undergraduate class write such a function in R for homework, and thought it may be interesting to present it here.

Among other things, readers who are not familiar with recursive function calls will learn about those here. I must add that all readers, even those with such background, will find this post to be rather subtle and requiring extra patience, but I believe you will find it worthwhile.

Let’s start with an example, in which we count the number of files with a given name:


countinst <- function(startdir,flname) {
   walk(startdir,checkname,
      list(flname=flname,tot=0))
}

checkname <- function(drname,filelist,arg) {
   if (arg$flname %in% filelist) 
      arg$tot <- arg$tot + 1
   arg
}

Say we try all this in a directory containing a subdirectory mydir that consists of a file x, and two subdirectories, d1 and d2. The latter in turn consists of another file named x. We then make the callcountinst(‘mydir’,’x’). As can be seen above, that call will in turn make the call


walk('mydir',checkname,list(flname='x',tot=0)

The walk() function, which I will present below, will start in mydir, and will call the user-specified function checkname() at every directory it encounters in the tree rooted at mydir, in this case mydir, d1 and d2.

At each such directory, walk() will pass to the user-specified function, in this casecheckname(), first the current directory name, thena character vectorreporting the names of all files (including directories) in that directory. In mydir, for instance, this vector will be c(‘d1′,’d2′,’x’). In addition,walk() will pass tocheckname() an argument, formally named arg above, which will serve as a vehicle for accumulating running totals, in this case the total number of files of the given name.

So, let’s look at walk() itself:


walk <- function(currdir,f,arg) {
   # "leave trail of bread crumbs"
   savetop <- getwd()
   setwd(currdir)
   fls <- list.files()
   arg <- f(currdir,fls,arg)
   # subdirectories of this directory
   dirs <- list.dirs(recursive=FALSE)
   for (d in dirs) arg <- walk(d,f,arg)
   setwd(savetop) # go back to calling directory
   arg  
}    

The comments are hopefully self-explanatory, but the key point is that within walk(), there is another call to walk()! This is recursion.

Note how arg accumulates, as needed in this application. If on the other hand we wanted, say, simply to remove all files named ‘x’, we would just put in a dummy variable for arg. And though we didn’t need drname here, in some applications it would be useful.

For compute-intensive tasks, recursion is not very efficient in R, but it can be quite handy in certain settings.

If you would like to conveniently try the above example, here is some test code:


test <- function() {
   unlink('mydir',recursive=TRUE)
   dir.create('mydir')
   file.create('mydir/x')
   dir.create('mydir/d1')
   dir.create('mydir/d2')
   file.create('mydir/d2/x')
   print(countinst('mydir','x'))
}

 

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)