Speed trick: Assigning large object NULL is much faster than using rm()!

May 25, 2013
By

(This article was first published on jottR, and kindly contributed to R-bloggers)

When processing large data sets in R you often also end up creating large temporary objects. In order to keep the memory footprint small, it is always good to remove those temporary objects as soon as possible. When done, removed objects will be deallocated from memory (RAM) the next time the garbage collection runs.

Better: Use rm(list="x") instead of rm(x), if using rm()

To remove an object in R, one can use the rm() function (with alias remove()). However, it turns out that that function has quite a bit of internal overhead (look at its R code), particularly if you call it as rm(x) rather than rm(list="x"). The former takes about three times longer to complete. Example:

> t1 <- system.time(for (k in 1:1e5) { a <- 1; rm(a); })
> t2 <- system.time(for (k in 1:1e5) { a <- 1; rm(list="a"); })
> t1
user system elapsed
10.45 0.00 10.50
> t2
user system elapsed
2.93 0.00 2.94
> t1/t2
user system elapsed
3.566553 NaN 3.571429

Note: In order to minimize the impact of the memory allocation on the benchmark, I use 'a <- 1' to represent the “large” object.

Best: Use x <- NULL instead of rm()

Instead of using rm(list="x"), which still has a fair amount of overhead, one can remove a large active object by assigning the corresponding variable a new value (a small object), e.g. x <- NULL. Whenever doing this, the previously assigned value (the large object) will become available for garbage collection. Example:

> t3 <- system.time(for (k in 1:1e5) { a <- 1; a <- NULL; })
> t3
user system elapsed
0.05 0.00 0.05
> t1/t3
user system elapsed
209 NaN 210

That's a 200 times speedup!

Background

I “accidentally” discovered this when profiling readMat() in my R.matlab package. In particular, there was one rm(x) call inside a local function that was called thousands of times when parsing modestly large MAT files. Together with some additional optimizations, R.matlab v2.0.0 (to be appear) is now 10-20 times faster. Now I'm going to review all my other packages for expensive rm() calls.

To leave a comment for the author, please follow the link and comment on their blog: jottR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)