A Julia version of the multinomial sampler

[This article was first published on A second megabyte of memory, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the previous post on RcppEigen I described an example of sampling from collection of multinomial distributions represented by a matrix of probabilities.  In the timing example the matrix was 100000 by 5 with each of the 100000 rows summing to 1.  The objective is to create a vector of 100000 1-based indices representing a sample from the probabilities in each row.

For each row we take the cumulative sums and, for safety, normalize by dividing by the last element then compare these values to a random draw from a standard uniform distribution.  The number of elements in the cumulative sums that are less than the uniform draw is the 0-based index of the result.  We add 1 to convert to a 1-based index.

I have been experimenting a bit with a very interesting new language called Julia and decided to write a similar function in it. The version shown here has been updated according to suggestions from Jeff Bezanson, Stefan Karpinski and Viral Shah on the julia-dev  list at Google Groups
function samp1(x::Array{Float64, 2},)
    cs = cumsum(reshape(x, length(x)))
    sum(cs/cs[end] < rand()) + 1
end

function samp(X::Array{Float64, 2},)
    if any(X < 0)
        error("Negative probabilities not allowed")
    end
    [samp1(X[i,:]) | i = 1:size(X,1)]
end
This version is about 10 times as fast as the pure R version but about 9 times slower than the RcppEigenversion.

Update:
In the thread on the julia-dev list about this example Stefan Karpinski showed that in Julia you can enhance performance by de-vectorizing your code and came up with the following version which is much faster than the RcppEigenversion
  
function SKsamp(X::Matrix{Float64})
  if any(X < 0)
    error("Negative probabilities not allowed")
  end
  s = Array(Int, size(X,1))
  for i = 1:size(X,1)
    r = rand()
    for j = 1:size(X,2)
      r -= X[i,j]
      if r <= 0.0
        s[i] = j
        break
      end
    end
  end
  return s
end       
  
To a longtime R/S programmer the concept of de-vectorizing your code seems heretical but I can understand that code created by a JITwill be happier with the looping and break style.

In any case, I think this example shows that R programmers should take a look at Julia. Two immediate applications I can imagine are McMC methods and large scale iterative fits such as Generalized linear models

To leave a comment for the author, please follow the link and comment on their blog: A second megabyte of memory.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)