A Julia version of the multinomial sampler

March 12, 2012

(This article was first published on A second megabyte of memory, and kindly contributed to R-bloggers)

In the previous post on RcppEigen I described an example of sampling from collection of multinomial distributions represented by a matrix of probabilities.  In the timing example the matrix was 100000 by 5 with each of the 100000 rows summing to 1.  The objective is to create a vector of 100000 1-based indices representing a sample from the probabilities in each row.

For each row we take the cumulative sums and, for safety, normalize by dividing by the last element then compare these values to a random draw from a standard uniform distribution.  The number of elements in the cumulative sums that are less than the uniform draw is the 0-based index of the result.  We add 1 to convert to a 1-based index.

I have been experimenting a bit with a very interesting new language called Julia and decided to write a similar function in it. The version shown here has been updated according to suggestions from Jeff Bezanson, Stefan Karpinski and Viral Shah on the julia-dev  list at Google Groups

function samp1(x::Array{Float64, 2},)
cs = cumsum(reshape(x, length(x)))
sum(cs/cs[end] < rand()) + 1

function samp(X::Array{Float64, 2},)
if any(X < 0)
error("Negative probabilities not allowed")
[samp1(X[i,:]) | i = 1:size(X,1)]

This version is about 10 times as fast as the pure R version but about 9 times slower than the RcppEigenversion.

In the thread on the julia-dev list about this example Stefan Karpinski showed that in Julia you can enhance performance by de-vectorizing your code and came up with the following version which is much faster than the RcppEigenversion

function SKsamp(X::Matrix{Float64})
if any(X < 0)
error("Negative probabilities not allowed")
s = Array(Int, size(X,1))
for i = 1:size(X,1)
r = rand()
for j = 1:size(X,2)
r -= X[i,j]
if r <= 0.0
s[i] = j
return s

To a longtime R/S programmer the concept of de-vectorizing your code seems heretical but I can understand that code created by a JITwill be happier with the looping and break style.

In any case, I think this example shows that R programmers should take a look at Julia. Two immediate applications I can imagine are McMC methods and large scale iterative fits such as Generalized linear models

To leave a comment for the author, please follow the link and comment on their blog: A second megabyte of memory.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)