Debugging Parallel Code with dbs()

Posted on January 4, 2015 by matloff in R bloggers | 0 Comments

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I mentioned yesterday that my partools package is now on CRAN. A number of people have expressed interest in the Snowdoop section, but in this post I want to call attention to the dbs() debugging tool in the package, useful for debugging code written for the portion of R’s parallel library that came from the old snow package.

I like to continue to call that portion “Snow,” using a capital and non-bold font to distinguish from the old CRAN package snow. Also, I refer to the entity that makes calls to, e.g., clusterApply() as the manager, and refer to the entities that respond as the workers.

You can use dbs() on any Unix-family platform, such as Macs, Linux or Cygwin. (This is the only part of the partools package with this restriction.)

Here is the problem that dbs() solves: When you run a Snow cluster-creation function, say,

> makeCluster(2)

this launches two new invocations of R. The problem is that they are not associated with terminals, and thus you can’t directly use R’s debug(), browser() and so on, let alone more sophisticated debugging tools. However, there is an indirect way: You can specify manual=TRUE in your call to makeCluster(), and then start your worker R processes yourself, by hand, from within terminal windows. You can insert a call to browser() inside the function you wish to debug, source() it, then run the function in each of these worker windows, single-stepping once the browser() call is hit.

This is fine in principle, but it’s a pain to actually do. So, I wrote dbs() and included it in my partools package. It automates the above process, very convenient. I’ll present a quick, simple example here.

Say I have a file x.R consisting of

f <- function(x) { x <- x + 1 x^2 }

Then I would call

> dbs(2,src="x.R",ftn="f",xterm="xterm")
Then dbs() would do all of the following for me, automatically, WITHOUT MY DOING ANY TYPING AT ALL:

Create 3 new terminal windows on my screen, two for my Snow workers and one for my Snow manager.
In the manager window, call makeCluster(2,manual=TRUE,port=ranport) (where ranport is a randomly chosen port number).
In the worker windows, invoke R with a connection to the manager, and have them listen for commands the manager will send them to run.
Have each of the worker processes execute source(“x.R”) and then debug(f).
Have the manager execute .libPaths() to acquire the same library search path that I’d been using at the time I ran dbs(). Then have each worker do so too.
Have the manager and workers load partools, in case it may be needed.

I can now go to the manager window, and run my Snow app as usual, say by typing (now I am typing again)

> clusterEvalQ(cls,f(5))

This gets the workers going, running f(), but since they had previously executed debug(), they now enter the browser. My screen now looks like this:

My original window, the one from which I had invoked dbs(), is seen in the lower-right. That invocation had created the three new windows, two for the workers at the top left and right, and one for the manager, at the middle bottom. I had then given an ordinary Snow cluster call in the latter, so f() started running in the top two windows, and they entered the browser. Those worker windows are now waiting for my ordinary R debug commands, such as n for single-stepping.

My argument xterm = “xterm” in my dbs() call needs comment. It can specify any kind of terminal window that supports the -e option (which states what program to run in a newly-created window). So, for instance, gnome-terminal in Ubuntu Linux would be fine.

If your system doesn’t have an xterm-family terminal window (I downloaded xterm to my Mac), you can still run dbs(), except that the function will require a little bit of typing by you in the procedure I listed above. It’s still a huge work saver even in that case.

(Maybe some of you Mac aficionados out there will see how to eliminate that little bit of typing for xterm-less Macs.)

So, that’s it. Happy debugging!

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Debugging Parallel Code with dbs()

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)