(This article was first published on

**R snippets**, and kindly contributed to R-bloggers)Last week I have posted about using simulation metamodeling to verify results of analytical solution of the model. After posting it I realized that the solution presented there can be improved by using knowledge of simulation model structure.

The story is about calculation of the variance of the random variable given by formula:

*q*(

*p*-2+

*q*/

*N*)

where *p* and *N* are model parameters and *q* is a random variable that has binomial distribution with *N* Bernoulli trials and probability of success 2-*p*. As I have written the properly specified formula for the metamodel has the following form:

v

**~ (**p**+**I**(**p**^**2**)****+**I**(**p**^**3**)****+**I**(**p**^**4))*** (**N**+**I**(**1**/**N**))**where v is an estimate of the variance. Such a model was discussed in my last post (it is described in more detail there so I omit the repetition in this post).

However one can easily notice that for

*p*=1 and*p*=2 the variance of the random variable is equal to 0. This is because in such case*q*has no variance (it is equal to 1 and 0 respectively).In estimation of the metamodel one can take this into account in two ways:

- the weight of cases where
*p*equals 1 or 2 might be increased (so we use Weighted Least Squares) - a restriction on values of prediction in points where
*p*equals 1 or 2 can be made to ensure that it is exactly 0 (so we use Constrained Least Squares)

The code given below estimates: original model and two versions of corrected models:

library

**(**plyr**)**library

**(**limSolve**)**sim

**<-****function****(**param**)****{** q

**<-**rbinom**(**1000, param**$**N, 2**–**param**$**p**)** c

**(**v**=**var**((**param**$**p**–**2**+**q**/**param**$**N**)*******q**))****}**

set.seed

**(**1**)**data.set

**<-**expand.grid**(**p**=**seq**(**1, 2, len**=**101**)**, N**=**1**:**10**)**data.set

**<-**ddply**(**data.set, .**(**p,N**)**, sim,.progress**=**“text”**)**lm.coef

**<-**coef**(**lm**(**v**~ (**p**+**I**(**p**^**2**)****+**I**(**p**^**3**)****+**I**(**p**^**4**))***** (**N

**+**I

**(**1

**/**N

**))**, data

**=**data.set

**))**

weights

**<-**ifelse**(**data.set**$**v**==**0, 100000, 1**)**wlm.coef

**<-**coef**(**lm**(**v**~ (**p**+**I**(**p**^**2**)****+**I**(**p**^**3**)****+**I**(**p**^**4**))***** (**N

**+**I

**(**1

**/**N

**))**, data

**=**data.set,

weights

**=**weights**))**A

**<-**with**(**data.set, cbind**(**1, p, p**^**2, p**^**3, p**^**4, N, 1**/**N, p

*****N, p**/**N, p**^**2*****N, p**^**2**/**N, p

**^**3*****N, p**^**3**/**N, p**^**4*****N, p**^**4**/**N**))**colnames

**(**A**)****<-**names**(**lm.coef**)**B

**<-**data.set**$**vE

**<-**A**[**B**==**0,**]**F

**<-**B**[**B**==**0**]**lsei.coef

**<-**lsei**(**A, B, E, F**)$**XNow we can compare the coefficients from these models to their analytical values given in the last post:

true.coef

**<-**c**(**32,**–**88, 88,**–**38, 6,**–**8,**–**26, 20, 75,

**–**18,**–**79, 7, 36,**–**1,**–**6**)**coef.dev

**<-**cbind**(**lm.coef, wlm.coef, lsei.coef**)****–**true.coefprint

**(**coef.dev, digits**=**2**)**dotchart

**(**coef.dev**[**,1**]**, pch**=**19**)**points

**(**coef.dev**[**,2**]**, 1**:**nrow**(**coef.dev**)**, col**=**“red”, pch**=**19**)**abline

**(**v**=**0, col**=**“gray”, lty**=**2**)**The code first tabulates the deviation of estimates from true values:

lm.coef wlm.coef lsei.coef

(Intercept) -0.6056 0.060 0.060

p 1.5061 -0.235 -0.235

I(p^2) -1.3576 0.323 0.323

I(p^3) 0.5243 -0.185 -0.185

I(p^4) -0.0731 0.038 0.038

N 0.0088 -0.060 -0.060

I(1/N) 1.1510 0.354 0.354

p:N -0.0065 0.173 0.173

p:I(1/N) -3.0646 -0.965 -0.965

I(p^2):N -0.0102 -0.182 -0.182

I(p^2):I(1/N) 2.9942 0.953 0.953

I(p^3):N 0.0115 0.084 0.084

I(p^3):I(1/N) -1.2730 -0.404 -0.404

I(p^4):N -0.0029 -0.014 -0.014

I(p^4):I(1/N) 0.1991 0.062 0.062

and next plots the visualization of the differences (on the plot black points represent OLS estimates and red – WLS):

It can be seen that:

- the precision of estimation is improved by using additional information on problem structure;
- the estimates from WLS and CLS are identical to the third decimal place.

To

**leave a comment**for the author, please follow the link and comment on their blog:**R snippets**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...