**R – Xi'an's Og**, and kindly contributed to R-bloggers)

**A**n interesting question about stratified sampling came up on X validated last week, namely how to optimise a Monte Carlo estimate based on two subsequent simulations, one, X, from a marginal and one or several Y from the corresponding conditional given X, when the costs of producing those two simulations significantly differ. When looking at the variance of the Monte Carlo estimate, this variance can be minimised in the number of simulations from the marginal under a computing budget. However, when the costs of producing x, y given or (x,y) are about the same, it does not pay to replicate simulations from y given x or x given y, because this increases the randomness of the estimator and induces positive correlation between some terms in the sum. Assuming that the conditional variances are always computable, we could envision an extension to the question where for each new value of a simulated x (or y), a variable number of conditionally simulated y (or x) could be produced. Or even when those conditional variances are unknown but can be replaced with empirical versions.

The illustration comes from a bivariate normal model with correlation, for a rather arbitrary function , but the pattern remains the same, namely that iid simulations of the pair (X,Y invariably leads to a smaller variance of the estimator compared with simulation with a 1:10 (10 Y’s for one X) or 10:1 ratio between x’s and y’s. Depending on the function and the relative variances, the 1:10 or 10:1 schemes may have a similar variability.

zigma=c(9,1,-.9*sqrt(1*9)) geney=function(x,n=1){ return(rnorm(n,mean=zigma[3]*x/zigma[1],sd=sqrt(zigma[2]- zigma[3]^2/zigma[1])))} genex=function(y,n=1){ return(rnorm(n,mean=zigma[3]*y/zigma[2],sd=sqrt(zigma[1]- zigma[3]*zigma[3]/zigma[2])))} targ=function(x,y){ log(x^2*y^4)+x^2*cos(x^2)/y*sin(y^2)} T=1e4;N=1e3 vales=matrix(0,N,3) for (i in 1:N){ xx=rnorm(T,sd=sqrt(zigma[1])) vales[i,1]=mean(targ(xx,geney(xx,n=T))) xx=rep(rnorm(T/10,sd=sqrt(zigma[1])),10) vales[i,2]=mean(targ(xx,geney(xx,n=T))) yy=rep(rnorm(T/10,sd=sqrt(zigma[2])),10) vales[i,3]=mean(targ(enex(yy,n=T),yy))}

Filed under: R, Statistics Tagged: conditional density, Cross Validation, Monte Carlo Statistical Methods, simulation, stratified sampling

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Xi'an's Og**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...