Any frequent reader of R-bloggers will have come across several posts concerning the optimization of code – in particular, the avoidance of loops.
Here's another aspect of the same issue. If you have experience programming in other languages besides R, this is probably a no-brainer, but for laymen, like myself, the following example was a total surprise. Basically, every time you redefine the size of an object in R, you are also redefining the allotted memory – and this takes some time. It's not necessarily a lot of time, but if you are having to do it during every iteration of a loop, it can really slow things down.
The following example shows three versions of a loop that creates random numbers and stores those numbers in a results object. The first example (Ex. 1) demonstrates the wrong approach, which is to concatenate the results onto the results object ("x") , thereby continually changing the size of x after each loop. The second approach (Ex. 2) is about 150x faster – x is defined as an empty matrix containing NAs, which is gradually filled (by row) during each loop. The third example (Ex. 3) shows another possibility if one does not know what the size of the results from each loop will be. An empty list is created of length equaling the number of loops. The elements of the list are then gradually filled with the loop results. Again, this is at least 150x faster than Ex. 1 (and I'm actually surprised to see that it may even be faster than Ex.2).
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...