conditional sampling

September 4, 2016
By

(This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers)

An interesting question about stratified sampling came up on X validated last week, namely how to optimise a Monte Carlo estimate based on two subsequent simulations, one, X, from a marginal and one or several Y from the corresponding conditional given X, when the costs of producing those two simulations significantly differ. When looking at the variance of the Monte Carlo estimate, this variance can be minimised in the number of simulations from the marginal under a computing budget. However, when the costs of producing x, y given or (x,y) are about the same, it does not pay to replicate simulations from y given x or x given y, because this increases the randomness of the estimator and induces positive correlation between some terms in the sum. Assuming that the conditional variances are always computable, we could envision an extension to the question where for each new value of a simulated x (or y), a variable number of conditionally simulated y (or x) could be produced. Or even when those conditional variances are unknown but can be replaced with empirical versions.

The illustration comes from a bivariate normal model with correlation, for a rather arbitrary function , but the pattern remains the same, namely that iid simulations of the pair (X,Y invariably leads to a smaller variance of the estimator compared with simulation with a 1:10 (10 Y’s for one X) or 10:1 ratio between x’s and y’s. Depending on the function and the relative variances, the 1:10 or 10:1 schemes may have a similar variability.

zigma=c(9,1,-.9*sqrt(1*9))

geney=function(x,n=1){
  return(rnorm(n,mean=zigma[3]*x/zigma[1],sd=sqrt(zigma[2]-
    zigma[3]^2/zigma[1])))}
genex=function(y,n=1){
  return(rnorm(n,mean=zigma[3]*y/zigma[2],sd=sqrt(zigma[1]-
     zigma[3]*zigma[3]/zigma[2])))}
targ=function(x,y){ log(x^2*y^4)+x^2*cos(x^2)/y*sin(y^2)}

T=1e4;N=1e3
vales=matrix(0,N,3)
for (i in 1:N){
   xx=rnorm(T,sd=sqrt(zigma[1]))
   vales[i,1]=mean(targ(xx,geney(xx,n=T)))
   xx=rep(rnorm(T/10,sd=sqrt(zigma[1])),10)
   vales[i,2]=mean(targ(xx,geney(xx,n=T)))
   yy=rep(rnorm(T/10,sd=sqrt(zigma[2])),10)
   vales[i,3]=mean(targ(enex(yy,n=T),yy))}

Filed under: R, Statistics Tagged: conditional density, Cross Validation, Monte Carlo Statistical Methods, simulation, stratified sampling

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)