(This article was first published on

I am one of the organizers of ESSA2013 conference that will take place in September 2013 in Warsaw, Poland. The conference scope is social simulation and in particular methods of statistical analysis of simulation output (metamodeling). As we have just issued Call for Papers for the conference so I decided to post a simple example of a metamodel.**R snippets**, and kindly contributed to R-bloggers)Recently I had to calculate variance of a random variable given as:

*q*(

*p*-2+

*q*/

*N*),

*p*and

*N*are model parameters and

*q*is a random variable that has binomial distribution with

*N*Bernoulli trials and probability of success 2-

*p*. After some lengthily calculations I came up with the following formula for the variance:

However the result is so long that I was not sure that it was correct, so I decided to verify it using metamodeling.

First I have generated the data for the simulation with the following code and visualized it. The data covers a grid where

*p*is from interval [1;2] and

*N*spans from 1 to 10. In each grid point 100000 simulations are performed and variance of

*q*(

*p*-2+

*q*/

*N*) is calculated:

library

**(**plyr**)**library

**(**lattice**)**sim

**<-****function****(**param**)****{** q

**<-**rbinom**(**100000, param**$**N, 2**-**param**$**p**)** c

**(**v**=**var**((**param**$**p**-**2**+**q**/**param**$**N**)*******q**))****}**

set.seed

**(**1**)**data.set

**<-**expand.grid**(**p**=**seq**(**1, 2, len**=**201**)**, N**=**1**:**10**)**data.set

**<-**ddply**(**data.set, .**(**p,N**)**, sim, .progress**=**"text"**)**levelplot

**(**v**~**p**+**N, data**=**data.set, col.regions

**=**terrain.colors, xlab**=**"p", ylab**=**"N"**)**Here is the resulting plot:

Next I have estimated the linear regression model with parameters reflecting analytical specification given above:

summary

**(**lm**(**v**~(**p**+**I**(**p**^**2**)****+**I**(**p**^**3**)****+**I**(**p**^**4**))***** (**N

**+**I

**(**1

**/**N

**))**, data

**=**data.set

**))**

The output looks as follows:

Call:

lm(formula = v ~ (p + I(p^2) + I(p^3) + I(p^4)) * (N + I(1/N)),

data = data.set)

Residuals:

Min 1Q Median 3Q Max

-0.0116801 -0.0007195 0.0000063 0.0007608 0.0098659

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 32.089136 0.235380 136.33 <2e-16 ***

p -88.232888 0.655760 -134.55 <2e-16 ***

I(p^2) 88.225183 0.674590 130.78 <2e-16 ***

I(p^3) -38.095417 0.303854 -125.37 <2e-16 ***

I(p^4) 6.014949 0.050597 118.88 <2e-16 ***

N -8.012673 0.027781 -288.42 <2e-16 ***

I(1/N) -26.082527 0.303365 -85.98 <2e-16 ***

p:N 20.034053 0.077398 258.84 <2e-16 ***

p:I(1/N) 75.213337 0.845166 88.99 <2e-16 ***

I(p^2):N -18.033947 0.079621 -226.50 <2e-16 ***

I(p^2):I(1/N) -79.203708 0.869434 -91.10 <2e-16 ***

I(p^3):N 7.014852 0.035863 195.60 <2e-16 ***

I(p^3):I(1/N) 36.085047 0.391618 92.14 <2e-16 ***

I(p^4):N -1.002405 0.005972 -167.85 <2e-16 ***

I(p^4):I(1/N) -6.013089 0.065212 -92.21 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.002197 on 1995 degrees of freedom

Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999

F-statistic: 1.976e+06 on 14 and 1995 DF, p-value: < 2.2e-16

As we can see the model fit is almost perfect and the estimates are very close to analytical results. This reassured me that my calculation was correct.

In this way you could use simulation to verify analytical results. Of course simulation metamodeling has much broader applications - especially when we

**do not**have analytical results.

I hope to organize a special session during ESSA2013 dedicated to simulation metamodeling using GNU R. So if you are interested in such topics and willing to promote GNU R in simulation society you are welcome to come to Warsaw.

To

**leave a comment**for the author, please follow the link and comment on his blog:**R snippets**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...