# unbalanced sampling

**R – Xi'an's Og**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**A** question from X validated on sampling from an unknown density *f* when given both a sample from the density *f* restricted to a (known) interval *A* , *f¹* say, and a sample from *f* restricted to the complement of *A,* *f²* say. Or at least on producing an estimate of the mass of *A* under *f, p(A)*…

The problem sounds impossible to solve without an ability to compute the density value at a given value, since any convex combination *αf¹+(1-α)f²* would return the same two samples. Assuming continuity of the density *f* at the boundary point *a* between *A* and its complement, a desperate solution for *p(A)/1-p(A)* is to take the ratio of the density estimates at the value *a*, which turns out not so poor an approximation if seemingly biased. This was surprising to me as kernel density estimates are notoriously bad at boundary points.

If *f(x)* can be computed [up to a constant] at an arbitrary *x*, it is obviously feasible to simulate from *f* and approximate *p(A)*. But the problem is then moot as a resolution would not even need the initial samples. If exploiting those to construct a single kernel density estimate, this estimate can be used as a proposal in an MCMC algorithm. Surprisingly (?), using instead the empirical cdf as proposal does not work.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Xi'an's Og**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.