(This article was first published on

Last week I have posted about using simulation metamodeling to verify results of analytical solution of the model. After posting it I realized that the solution presented there can be improved by using knowledge of simulation model structure.**R snippets**, and kindly contributed to R-bloggers)The story is about calculation of the variance of the random variable given by formula:

*q*(

*p*-2+

*q*/

*N*)

where

*p*and

*N*are model parameters and

*q*is a random variable that has binomial distribution with

*N*Bernoulli trials and probability of success 2-

*p*. As I have written the properly specified formula for the metamodel has the following form:

v

**~ (**p**+**I**(**p**^**2**)****+**I**(**p**^**3**)****+**I**(**p**^**4))*** (**N**+**I**(**1**/**N**))**where v is an estimate of the variance. Such a model was discussed in my last post (it is described in more detail there so I omit the repetition in this post).

However one can easily notice that for

*p*=1 and*p*=2 the variance of the random variable is equal to 0. This is because in such case*q*has no variance (it is equal to 1 and 0 respectively).In estimation of the metamodel one can take this into account in two ways:

- the weight of cases where
*p*equals 1 or 2 might be increased (so we use Weighted Least Squares) - a restriction on values of prediction in points where
*p*equals 1 or 2 can be made to ensure that it is exactly 0 (so we use Constrained Least Squares)

The code given below estimates: original model and two versions of corrected models:

library

**(**plyr**)**library

**(**limSolve**)**sim

**<-****function****(**param**)****{** q

**<-**rbinom**(**1000, param**$**N, 2**-**param**$**p**)** c

**(**v**=**var**((**param**$**p**-**2**+**q**/**param**$**N**)*******q**))****}**

set.seed

**(**1**)**data.set

**<-**expand.grid**(**p**=**seq**(**1, 2, len**=**101**)**, N**=**1**:**10**)**data.set

**<-**ddply**(**data.set, .**(**p,N**)**, sim,.progress**=**"text"**)**lm.coef

**<-**coef**(**lm**(**v**~ (**p**+**I**(**p**^**2**)****+**I**(**p**^**3**)****+**I**(**p**^**4**))***** (**N

**+**I

**(**1

**/**N

**))**, data

**=**data.set

**))**

weights

**<-**ifelse**(**data.set**$**v**==**0, 100000, 1**)**wlm.coef

**<-**coef**(**lm**(**v**~ (**p**+**I**(**p**^**2**)****+**I**(**p**^**3**)****+**I**(**p**^**4**))***** (**N

**+**I

**(**1

**/**N

**))**, data

**=**data.set,

weights

**=**weights**))**A

**<-**with**(**data.set, cbind**(**1, p, p**^**2, p**^**3, p**^**4, N, 1**/**N, p

*****N, p**/**N, p**^**2*****N, p**^**2**/**N, p

**^**3*****N, p**^**3**/**N, p**^**4*****N, p**^**4**/**N**))**colnames

**(**A**)****<-**names**(**lm.coef**)**B

**<-**data.set**$**vE

**<-**A**[**B**==**0,**]**F

**<-**B**[**B**==**0**]**lsei.coef

**<-**lsei**(**A, B, E, F**)$**XNow we can compare the coefficients from these models to their analytical values given in the last post:

true.coef

**<-**c**(**32,**-**88, 88,**-**38, 6,**-**8,**-**26, 20, 75,

**-**18,**-**79, 7, 36,**-**1,**-**6**)**coef.dev

**<-**cbind**(**lm.coef, wlm.coef, lsei.coef**)****-**true.coefprint

**(**coef.dev, digits**=**2**)**dotchart

**(**coef.dev**[**,1**]**, pch**=**19**)**points

**(**coef.dev**[**,2**]**, 1**:**nrow**(**coef.dev**)**, col**=**"red", pch**=**19**)**abline

**(**v**=**0, col**=**"gray", lty**=**2**)**The code first tabulates the deviation of estimates from true values:

lm.coef wlm.coef lsei.coef

(Intercept) -0.6056 0.060 0.060

p 1.5061 -0.235 -0.235

I(p^2) -1.3576 0.323 0.323

I(p^3) 0.5243 -0.185 -0.185

I(p^4) -0.0731 0.038 0.038

N 0.0088 -0.060 -0.060

I(1/N) 1.1510 0.354 0.354

p:N -0.0065 0.173 0.173

p:I(1/N) -3.0646 -0.965 -0.965

I(p^2):N -0.0102 -0.182 -0.182

I(p^2):I(1/N) 2.9942 0.953 0.953

I(p^3):N 0.0115 0.084 0.084

I(p^3):I(1/N) -1.2730 -0.404 -0.404

I(p^4):N -0.0029 -0.014 -0.014

I(p^4):I(1/N) 0.1991 0.062 0.062

and next plots the visualization of the differences (on the plot black points represent OLS estimates and red - WLS):

It can be seen that:

- the precision of estimation is improved by using additional information on problem structure;
- the estimates from WLS and CLS are identical to the third decimal place.

To

**leave a comment**for the author, please follow the link and comment on his blog:**R snippets**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...