simulation under zero measure constraints

[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A theme that comes up fairly regularly on X validated is the production of a sample with given moments, either for calibration motives or from a misunderstanding of the difference between a distribution mean and a sample average. Here are some entries on that topic:

In most of those questions, the constraint in on the sum or mean of the sample, which allows for an easy resolution by a change of variables. It however gets somewhat harder when the constraint involves more moments or, worse, an implicit solution to an equation. A good example of the later is the quest for a sample with a given maximum likelihood estimate in the case this MLE cannot be derived analytically. As for instance with a location-scale t sample…

Actually, even when the constraint is solely on the sum, a relevant question is the production of an efficient simulation mechanism. Using a Gibbs sampler that changes one component of the sample at each iteration does not qualify, even though it eventually produces the proper sample. Except for small samples. As in this example

<span class="pln">n</span><span class="pun">=</span><span class="lit">3</span><span class="pun">;</span><span class="pln">T</span><span class="pun">=</span><span class="lit">1e4</span><span class="pln">
s0</span><span class="pun">=.</span><span class="lit">5</span> <span class="com">#fixed average</span><span class="pln">
sampl</span><span class="pun">=</span><span class="pln">matrix</span><span class="pun">(</span><span class="pln">s0</span><span class="pun">,</span><span class="pln">T</span><span class="pun">,</span><span class="pln">n</span><span class="pun">)</span>
<span class="kwd">for</span> <span class="pun">(</span><span class="pln">t </span><span class="kwd">in</span> <span class="lit">2</span><span class="pun">:</span><span class="pln">T</span><span class="pun">){</span><span class="pln">
 sampl</span><span class="pun">[</span><span class="pln">t</span><span class="pun">,]=</span><span class="pln">sampl</span><span class="pun">[</span><span class="pln">t</span><span class="pun">-</span><span class="lit">1</span><span class="pun">,]</span>
 <span class="kwd">for</span> <span class="pun">(</span><span class="pln">i </span><span class="kwd">in</span> <span class="lit">1</span><span class="pun">:(</span><span class="pln">n</span><span class="pun">-</span><span class="lit">1</span><span class="pun">)){</span><span class="pln">
  sampl</span><span class="pun">[</span><span class="pln">t</span><span class="pun">,</span><span class="pln">i</span><span class="pun">]=</span><span class="pln">runif</span><span class="pun">(</span><span class="lit">1</span><span class="pun">,</span><span class="pln">
  min</span><span class="pun">=</span><span class="pln">max</span><span class="pun">(</span><span class="lit">0</span><span class="pun">,</span><span class="pln">n</span><span class="pun">*</span><span class="pln">s0</span><span class="pun">-</span><span class="pln">sum</span><span class="pun">(</span><span class="pln">sampl</span><span class="pun">[</span><span class="pln">t</span><span class="pun">,</span><span class="pln">c</span><span class="pun">(-</span><span class="pln">i</span><span class="pun">,-</span><span class="pln">n</span><span class="pun">)])-</span><span class="lit">1</span><span class="pun">),</span><span class="pln">
  max</span><span class="pun">=</span><span class="pln">min</span><span class="pun">(</span><span class="lit">1</span><span class="pun">,</span><span class="pln">n</span><span class="pun">*</span><span class="pln">s0</span><span class="pun">-</span><span class="pln">sum</span><span class="pun">(</span><span class="pln">sampl</span><span class="pun">[</span><span class="pln">t</span><span class="pun">,</span><span class="pln">c</span><span class="pun">(-</span><span class="pln">i</span><span class="pun">,-</span><span class="pln">n</span><span class="pun">)])))</span><span class="pln">
 sampl</span><span class="pun">[</span><span class="pln">t</span><span class="pun">,</span><span class="pln">n</span><span class="pun">]=</span><span class="pln">n</span><span class="pun">*</span><span class="pln">s0</span><span class="pun">-</span><span class="pln">sum</span><span class="pun">(</span><span class="pln">sampl</span><span class="pun">[</span><span class="pln">t</span><span class="pun">,-</span><span class="pln">n</span><span class="pun">])}}</span>

For very large samples, I figure that proposing from the unconstrained density can achieve a sufficient efficiency, but the in-between setting remains an interesting problem.

Filed under: Books, Kids, R, Statistics, University life Tagged: cross validated, Gibbs sampler, maximum likelihood estimation, Monte Carlo algorithm, Monte Carlo Statistical Methods, simulation, Stack Exchange, Statistics Forum

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)