PITFALL: Did you really mean to use matrix(nrow, ncol)?

[This article was first published on jottR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Are you a good R citizen and preallocates your matrices? If you are allocating a numeric matrix in one of the following two ways, then you are doing it the wrong way!
x <- matrix(nrow=500, ncol=100)
or
x <- matrix(NA, nrow=500, ncol=100)
Why? Because it is counter productive. And why is that? In the above, 'x' becomes a logical matrix, and not a numeric matrix as intended. This is because the default value of the 'data' argument of matrix() is NA, which is a logical value, i.e.
> x <- matrix(nrow = 500, ncol = 100)
> mode(x)
[1] "logical"
> str(x)
 logi [1:500, 1:100] NA NA NA NA NA NA ...
Why is that bad? Because, as soon as you assign a numeric value to any of the cells in 'x', the matrix will first have to be coerced to numeric when a new value is assigned. The originally allocated logical matrix was allocated in vain and just adds an unnecessary memory footprint and extra work for the garbage collector.
Instead allocate it using NA_real_ (or NA_integer_ for integers):
x <- matrix(NA_real_, nrow=500, ncol=100)
Of course, if you wish to allocate a matrix with all zeros, use 0 instead of NA_real_ (or 0L for integers).
The exact same thing happens with array(), which is also because the default value is NA, e.g.
> x <- array(dim=c(500,100))
> mode(x)
[1] "logical"
Similarly, be careful when you setup vectors using rep(), e.g. compare
x <- rep(NA, times=500)
to
x <- rep(NA_real_, times=500)
Note, if all you want is an empty vector with all zeros, you may as well use
x <- double(500)
for doubles and
x <- integer(500)
for integers.

Details

In the 'base' package there is a neat little function called tracemem() that can be used to trace the internal copying of objects. We can use it to show how the two cases differ. Lets start by doing it the wrong way:
> x <- matrix(nrow=500, ncol=100)
> tracemem(x)
[1] "<0x00000000100a0040>"
> x[1,1] <- 3.14
tracemem[0x00000000100a0040 -> 0x000007ffffba0010]:
> x[1,2] <- 2.71
>
That 'tracemem' output message basically tells us that 'x' is copied, or more precisely that a new internal object (0x000007ffffba0010) is allocated and that 'x' now refers to that instead of the original one (0x00000000100a0040). This happens because 'x' needs to be coerced from logical to numerical before assigning cell (1,1) the (numerical) value 3.14. Note that there is no need for R to create a copy in the second assignment to 'x', because at this point it is already of a numeric type (and there are no other variables referring to it).
To avoid the extra copy, lets make sure to allocate a numeric matrix from the start and there will be no extra copies created:
> x <- matrix(NA_real_, nrow=500, ncol=100)
> tracemem(x)
[1] "<0x000007ffffd70010>"
> x[1,1] <- 3.14
> x[1,2] <- 2.71
>

Appendix

Session information

R version 3.1.0 Patched (2014-06-11 r65921)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] markdown_0.7      R.cache_0.10.0    knitr_1.6         ggplot2_1.0.0    
 [5] R.devices_2.9.2   lineprof_0.1      pryr_0.1          devtools_1.5     
 [9] R.utils_1.32.5    R.oo_1.18.2       R.methodsS3_1.6.2

loaded via a namespace (and not attached):
 [1] base64enc_0.1-1  codetools_0.2-8  colorspace_1.2-4 digest_0.6.4    
 [5] evaluate_0.5.5   formatR_0.10     grid_3.1.0       gtable_0.1.2    
 [9] httr_0.3         MASS_7.3-33      memoise_0.2.1    mime_0.1.1      
[13] munsell_0.4.2    parallel_3.1.0   plyr_1.8.1       proto_0.3-10    
[17] R.rsp_0.19.0     Rcpp_0.11.2      RCurl_1.95-4.1   reshape2_1.4    
[21] scales_0.2.4     stringr_0.6.2    tools_3.1.0      whisker_0.3-2   

Reproducibility

This report was generated from an RSP-embedded Markdown document using R.rsp v0.19.0. Image courtesy of freefoto.com.

To leave a comment for the author, please follow the link and comment on their blog: jottR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)