Simulation metamodeling with constraints

[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week I have posted about using simulation metamodeling to verify results of analytical solution of the model. After posting it I realized that the solution presented there can be improved by using knowledge of simulation model structure.

The story is about calculation of the variance of the random variable given by formula:

q(p-2+q/N)

where p and N are model parameters and q is a random variable that has binomial distribution with N Bernoulli trials and probability of success 2-p. As I have written the properly specified formula for the metamodel has the following form:

~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4)) * (N + I(1 /N))

where v is an estimate of the variance. Such a model was discussed in my last post (it is described in more detail there so I omit the repetition in this post).

However one can easily notice that for p=1 and p=2 the variance of the random variable is equal to 0. This is because in such case q has no variance (it is equal to 1 and 0 respectively).

In estimation of the metamodel one can take this into account in two ways:
  1. the weight of cases where p equals 1 or 2 might be increased (so we use Weighted Least Squares)
  2. a restriction on values of prediction in points where p equals 1 or 2 can be made to ensure that it is exactly 0 (so we use Constrained Least Squares)
The code given below estimates: original model and two versions of corrected models:

library(plyr)
library(limSolve)

sim <- function(param) {
  q <- rbinom(1000, param$N, 2 param$p)
  c(v = var((param$p 2 + q / param$N) * q))
}

set.seed(1)
data.set <- expand.grid(p = seq(1, 2, len = 101), N = 1:10)
data.set <- ddply(data.set, .(p,N), sim,.progress = “text”)

lm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
              * (N + I(1 / N)), data = data.set))

weights <- ifelse(data.set$v==0, 100000, 1)
wlm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
                 * (N + I(1 / N)), data = data.set,
                 weights = weights))

A <- with(data.set, cbind(1, p, p^2, p^3, p^4, N, 1 / N,
  p * N, p / N, p ^ 2 * N, p ^ 2 / N,
  p ^ 3 * N, p ^ 3 / N, p ^ 4 * N, p ^ 4 / N))
colnames(A) <- names(lm.coef)
B <- data.set$v
E <- A[B == 0,]
F <- B[B == 0]
lsei.coef <- lsei(A, B, E, F)$X

Now we can compare the coefficients from these models to their analytical values given in the last post:

true.coef <- c(32, 88, 88, 38, 6, 8, 26, 20,
               75, 18, 79, 7, 36, 1, 6)
coef.dev <- cbind(lm.coef, wlm.coef, lsei.coef) true.coef
print(coef.dev, digits = 2)
dotchart(coef.dev[,1], pch = 19)
points(coef.dev[,2], 1:nrow(coef.dev), col = “red”, pch = 19)
abline(v = 0, col = “gray”, lty = 2)

The code first tabulates the deviation of estimates from true values:


              lm.coef wlm.coef lsei.coef
(Intercept)   -0.6056    0.060     0.060
p              1.5061   -0.235    -0.235
I(p^2)        -1.3576    0.323     0.323
I(p^3)         0.5243   -0.185    -0.185
I(p^4)        -0.0731    0.038     0.038
N              0.0088   -0.060    -0.060
I(1/N)         1.1510    0.354     0.354
p:N           -0.0065    0.173     0.173
p:I(1/N)      -3.0646   -0.965    -0.965
I(p^2):N      -0.0102   -0.182    -0.182
I(p^2):I(1/N)  2.9942    0.953     0.953
I(p^3):N       0.0115    0.084     0.084
I(p^3):I(1/N) -1.2730   -0.404    -0.404
I(p^4):N      -0.0029   -0.014    -0.014
I(p^4):I(1/N)  0.1991    0.062     0.062

and next plots the visualization of the differences (on the plot black points represent OLS estimates and red – WLS):


It can be seen that:
  • the precision of estimation is improved by using additional information on problem structure;
  • the estimates from WLS and CLS are identical to the third decimal place.

To leave a comment for the author, please follow the link and comment on their blog: R snippets.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)