Progress bars in R (part II) – a wrapper for apply functions

[This article was first published on "R" you ready?, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a previous post I gave some examples of how to make a progress bar in R. In the examples the bars were created within loops. Very often though I have situations where I would like have a progress bar when using apply(). The plyr package provides several apply-like functions also including progress bars, so one could have a look here and use a plyr function instead of apply if possible. Anyway, here comes a wrapper for apply, lapply and sapply that has a progressbar. It seems to work although one known issue is the use of vectors (like c(1,2)with the MARGIN argument in apply_pb(). Also you can see in the performance comparison below that the wrapper causes overhead to a considerable extent, which is the main drawback of this approach.

###############################################################

# STATUS: WORKING, but only tested once or twice,
# tested with most ?apply examples
# ISSUES/TODO: MARGIN argument cannot take a
# vector like 1:2 that is more than one numeric

apply_pb <- function(X, MARGIN, FUN, ...)
{
  env <- environment()
  pb_Total <- sum(dim(X)[MARGIN])
  counter <- 0
  pb <- txtProgressBar(min = 0, max = pb_Total,
                       style = 3)

  wrapper <- function(...)
  {
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir= env)
    setTxtProgressBar(get("pb", envir= env),
                           curVal +1)
    FUN(...)
  }
  res <- apply(X, MARGIN, wrapper, ...)
  close(pb)
  res
}

## NOT RUN:
# apply_pb(anscombe, 2, sd, na.rm=TRUE)

## large dataset
# df <- data.frame(rnorm(30000), rnorm(30000))
# apply_pb(df, 1, sd)

###############################################################

lapply_pb <- function(X, FUN, ...)
{
 env <- environment()
 pb_Total <- length(X)
 counter <- 0
 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)   

 # wrapper around FUN
 wrapper <- function(...){
   curVal <- get("counter", envir = env)
   assign("counter", curVal +1 ,envir=env)
   setTxtProgressBar(get("pb", envir=env), curVal +1)
   FUN(...)
 }
 res <- lapply(X, wrapper, ...)
 close(pb)
 res
}

## NOT RUN:
# l <- sapply(1:20000, function(x) list(rnorm(1000)))
# lapply_pb(l, mean)

###############################################################

sapply_pb <- function(X, FUN, ...)
{
  env <- environment()
  pb_Total <- length(X)
  counter <- 0
  pb <- txtProgressBar(min = 0, max = pb_Total, style = 3)

  wrapper <- function(...){
    curVal <- get("counter", envir = env)
    assign("counter", curVal +1 ,envir=env)
    setTxtProgressBar(get("pb", envir=env), curVal +1)
    FUN(...)
  }
  res <- sapply(X, wrapper, ...)
  close(pb)
  res
}

## NOT RUN:
# l <- sapply(1:20000, function(x) list(rnorm(1000))
# sapply_pb(l, mean)

###############################################################

Nice up to now, but now let’s see what the difference in performance due to the wrapper overhead looks like.

###############################################################

> l <- sapply(1:20000, function(x) list(rnorm(1000)))
> system.time(sapply(l, mean))
User      System    verstrichen
0.474       0.003       0.475
> system.time(sapply_pb(l, mean))
|======================================================| 100%
User      System    verstrichen
1.863       0.025       1.885

> df <- data.frame(rnorm(90000), rnorm(90000))
> system.time(apply(df, 1, sd))
User      System verstrichen
7.152       0.062       7.260
> system.time(apply_pb(df, 1, sd))
|======================================================| 100%
User      System     verstrichen
13.112       0.099      13.192

###############################################################

So, what we see is that performance radically goes down. This is extremely problematic in our context as one will tend to use progress bars in situations where processing times are already quite long. So if someone has an improvement for that I would be glad to hear about it.

Latest version with more comments on github.


To leave a comment for the author, please follow the link and comment on their blog: "R" you ready?.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)