Go vector or go home

September 21, 2011
By

(This article was first published on uu kk, and kindly contributed to R-bloggers)

My programming experience progressed mostly along the lines of: C, C++, shell, Java, Java, Ruby, Python, Java, Java. Only recently have I started exploring the likes of Haskell, Erlang and R. Well that evolution bit me a little while back when I tried using an iterative approach in R where a vectorized solution would have been better.

I was dealing with a vector of timestamps that were formatted as 'seconds since the epoch' and what I wanted was to limit that vector to weekend timestamps only.

My naive approach was to construct a simple loop over the values and apply a function to each element. I was only dealing with about 20,000 elements but the time to do this was painfully slow - roughly 20 seconds - so I started investigating an apply-like approach. R provides several ways to do this depending on the input/output requirements: lapply, sapply, and vapply. All three resulted in behavior similar to the simple loop.

The function to test for weekend-ness is as follows:
is.weekend <- function(x) {

tm <- as.POSIXlt(x,origin="1970/01/01")
format(tm,"%a") %in% c("Sat","Sun")
}
I don't know the specific details of date/time conversion in R but I was pretty sure that this was not the bottleneck. After a little searching I came upon a different approach. Instead of looping over each element I should have been passing the entire vector around to the functions. I believe that the apply functions take the vector as an argument but do the manual loop internally. However, R supports a more native approach to handling vectors: vectorized operations.

Looping:
use.sapply <- function(data) {

data[sapply(data$TIME,FUN=is.weekend),]
}

system.time(v <- use.sapply(csv.data))
user system elapsed
19.456 6.492 25.951

Vectorized:
use.vector <- function(data) {

data[is.weekend(data$TIME),]
}

system.time(v <- use.vector(csv.data))
user system elapsed
0.032 0.020 0.052

Duly noted.

To leave a comment for the author, please follow the link and comment on his blog: uu kk.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.