**"R" you ready?**, and kindly contributed to R-bloggers)

In a previous post I gave some examples of how to make a progress bar in R. In the examples the bars were created within loops. Very often though I have situations where I would like have a progress bar when using `apply()`. The `plyr` package provides several `apply`-like functions also including progress bars, so one could have a look here and use a `plyr` function instead of apply if possible. Anyway, here comes a wrapper for `apply`, `lapply` and `sapply` that has a progressbar. It seems to work although one known issue is the use of vectors (like `c(1,2)`with the `MARGIN` argument in `apply_pb()`. Also you can see in the performance comparison below that the wrapper causes overhead to a considerable extent, which is the main drawback of this approach.

############################################################### # STATUS: WORKING, but only tested once or twice, # tested with most ?apply examples # ISSUES/TODO: MARGIN argument cannot take a # vector like 1:2 that is more than one numeric apply_pb <- function(X, MARGIN, FUN, ...) { env <- environment() pb_Total <- sum(dim(X)[MARGIN]) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...) { curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir= env) setTxtProgressBar(get("pb", envir= env), curVal +1) FUN(...) } res <- apply(X, MARGIN, wrapper, ...) close(pb) res } ## NOT RUN: # apply_pb(anscombe, 2, sd, na.rm=TRUE) ## large dataset # df <- data.frame(rnorm(30000), rnorm(30000)) # apply_pb(df, 1, sd) ############################################################### lapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) # wrapper around FUN wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- lapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000))) # lapply_pb(l, mean) ############################################################### sapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- sapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000)) # sapply_pb(l, mean) ###############################################################

Nice up to now, but now let’s see what the difference in performance due to the wrapper overhead looks like.

############################################################### > l <- sapply(1:20000, function(x) list(rnorm(1000))) > system.time(sapply(l, mean)) User System verstrichen 0.474 0.003 0.475 > system.time(sapply_pb(l, mean)) |======================================================| 100% User System verstrichen 1.863 0.025 1.885 > df <- data.frame(rnorm(90000), rnorm(90000)) > system.time(apply(df, 1, sd)) User System verstrichen 7.152 0.062 7.260 > system.time(apply_pb(df, 1, sd)) |======================================================| 100% User System verstrichen 13.112 0.099 13.192 ###############################################################

So, what we see is that performance radically goes down. This is extremely problematic in our context as one will tend to use progress bars in situations where processing times are already quite long. So if someone has an improvement for that I would be glad to hear about it.

Latest version with more comments on github.

**leave a comment**for the author, please follow the link and comment on his blog:

**"R" you ready?**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...