Making matrices with zeros and ones

[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


So I was trying to figure out a fast way to make matrices with randomly allocated 0 or 1 in each cell of the matrix. I reached out on Twitter, and got many responses (thanks tweeps!).


Here is the solution I came up with. See if you can tell why it would be slow.

mm <span class="o"><-</span> matrix<span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">10</span><span class="p">,</span> <span class="m">5</span><span class="p">)</span>
apply<span class="p">(</span>mm<span class="p">,</span> c<span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> sample<span class="p">(</span>c<span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> <span class="m">1</span><span class="p">))</span>
      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    0    1    0    1
 [2,]    0    0    1    1    1
 [3,]    0    0    0    0    1
 [4,]    0    1    1    0    1
 [5,]    0    1    1    1    1
 [6,]    1    0    1    1    1
 [7,]    0    1    0    1    0
 [8,]    0    0    1    0    1
 [9,]    1    0    1    1    1
[10,]    1    0    0    1    1

Ted Hart (@distribecology) replied first with:

matrix<span class="p">(</span>rbinom<span class="p">(</span><span class="m">10</span> <span class="o">*</span> <span class="m">5</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> ncol <span class="o">=</span> <span class="m">5</span><span class="p">,</span> nrow <span class="o">=</span> <span class="m">10</span><span class="p">)</span>
      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    1    0    1    1
 [2,]    1    0    0    1    0
 [3,]    0    1    0    0    0
 [4,]    0    0    1    0    0
 [5,]    1    0    1    0    0
 [6,]    0    0    0    0    1
 [7,]    1    0    0    0    0
 [8,]    0    1    0    1    0
 [9,]    1    1    1    1    0
[10,]    0    1    1    0    0

Next, David Smith (@revodavid) and Rafael Maia (@hylospar) came up with about the same solution.

m <span class="o"><-</span> <span class="m">10</span>
n <span class="o"><-</span> <span class="m">5</span>
matrix<span class="p">(</span>sample<span class="p">(</span><span class="m">0</span>:<span class="m">1</span><span class="p">,</span> m <span class="o">*</span> n<span class="p">,</span> replace <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span> m<span class="p">,</span> n<span class="p">)</span>
      [,1] [,2] [,3] [,4] [,5]
 [1,]    0    0    0    0    1
 [2,]    0    0    0    0    0
 [3,]    0    1    1    0    1
 [4,]    1    0    0    1    0
 [5,]    0    0    0    0    1
 [6,]    1    0    1    1    1
 [7,]    1    1    1    1    0
 [8,]    0    0    0    1    1
 [9,]    1    0    0    0    1
[10,]    0    1    0    1    1

Then there was the solution by Luis Apiolaza (@zentree).

m <span class="o"><-</span> <span class="m">10</span>
n <span class="o"><-</span> <span class="m">5</span>
round<span class="p">(</span>matrix<span class="p">(</span>runif<span class="p">(</span>m <span class="o">*</span> n<span class="p">),</span> m<span class="p">,</span> n<span class="p">))</span>
      [,1] [,2] [,3] [,4] [,5]
 [1,]    0    1    1    0    0
 [2,]    1    0    1    1    0
 [3,]    1    0    1    0    0
 [4,]    1    0    0    0    1
 [5,]    1    0    1    1    0
 [6,]    1    0    0    0    0
 [7,]    1    0    0    0    0
 [8,]    1    1    1    0    0
 [9,]    0    0    0    0    1
[10,]    1    0    0    1    1

Last, a solution was proposed using RcppArmadillo, but I couldn’t get it to work on my machine, but here is the function anyway if someone can.

library<span class="p">(</span>inline<span class="p">)</span>
library<span class="p">(</span>RcppArmadillo<span class="p">)</span>
f <span class="o"><-</span> cxxfunction<span class="p">(</span>body <span class="o">=</span> <span class="s">"return wrap(arma::randu(5,10));"</span><span class="p">,</span> plugin <span class="o">=</span> <span class="s">"RcppArmadillo"</span><span class="p">)</span>

And here is the comparison of system.time for each solution.

mm <span class="o"><-</span> matrix<span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">10</span><span class="p">,</span> <span class="m">5</span><span class="p">)</span>
m <span class="o"><-</span> <span class="m">10</span>
n <span class="o"><-</span> <span class="m">5</span>

system.time<span class="p">(</span>replicate<span class="p">(</span><span class="m">1000</span><span class="p">,</span> apply<span class="p">(</span>mm<span class="p">,</span> c<span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> sample<span class="p">(</span>c<span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> <span class="m">1</span><span class="p">))))</span>  <span class="c1"># @recology_</span>
   user  system elapsed 
  0.470   0.002   0.471
system.time<span class="p">(</span>replicate<span class="p">(</span><span class="m">1000</span><span class="p">,</span> matrix<span class="p">(</span>rbinom<span class="p">(</span><span class="m">10</span> <span class="o">*</span> <span class="m">5</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> ncol <span class="o">=</span> <span class="m">5</span><span class="p">,</span> nrow <span class="o">=</span> <span class="m">10</span><span class="p">)))</span>  <span class="c1"># @distribecology</span>
   user  system elapsed 
  0.014   0.000   0.015
system.time<span class="p">(</span>replicate<span class="p">(</span><span class="m">1000</span><span class="p">,</span> matrix<span class="p">(</span>sample<span class="p">(</span><span class="m">0</span>:<span class="m">1</span><span class="p">,</span> m <span class="o">*</span> n<span class="p">,</span> replace <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span> m<span class="p">,</span> n<span class="p">)))</span>  <span class="c1"># @revodavid & @hylospar</span>
   user  system elapsed 
  0.015   0.000   0.014
system.time<span class="p">(</span>replicate<span class="p">(</span><span class="m">1000</span><span class="p">,</span> round<span class="p">(</span>matrix<span class="p">(</span>runif<span class="p">(</span>m <span class="o">*</span> n<span class="p">),</span> m<span class="p">,</span> n<span class="p">)),</span> <span class="p">))</span>  <span class="c1"># @zentree</span>
   user  system elapsed 
  0.014   0.000   0.014

If you want to take the time to learn C++ or already know it, the RcppArmadillo option would likely be the fastest, but I think (IMO) for many scientists, especially ecologists, we probably don’t already know C++, so will stick to the next fastest options.


Get the .Rmd file used to create this post at my github account.


Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.

To leave a comment for the author, please follow the link and comment on their blog: Recology - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)