[This article was first published on R snippets
, and kindly contributed to R-bloggers
]. (You can report issue about the content on this page here
Want to share your content on R-bloggers? click here
if you have a blog, or here
if you don't.
I am one of the organizers of ESSA2013 conference that will take place in September 2013 in Warsaw, Poland. The conference scope is social simulation and in particular methods of statistical analysis of simulation output (metamodeling). As we have just issued Call for Papers for the conference so I decided to post a simple example of a metamodel.
Recently I had to calculate variance of a random variable given as:
where p and N are model parameters and q is a random variable that has binomial distribution with N Bernoulli trials and probability of success 2-p. After some lengthily calculations I came up with the following formula for the variance:
However the result is so long that I was not sure that it was correct, so I decided to verify it using metamodeling.
First I have generated the data for the simulation with the following code and visualized it. The data covers a grid where p is from interval [1;2] and N spans from 1 to 10. In each grid point 100000 simulations are performed and variance of q(p-2+q/N) is calculated:
q <- rbinom(100000, param$N, 2 – param$p)
c(v = var((param$p – 2 + q / param$N) * q))
data.set <- expand.grid(p = seq(1, 2, len = 201), N = 1:10)
data.set <- ddply(data.set, .(p,N), sim, .progress = “text”)
levelplot(v ~ p + N, data = data.set,
col.regions = terrain.colors, xlab = “p”, ylab = “N”)
Here is the resulting plot:
Next I have estimated the linear regression model with parameters reflecting analytical specification given above:
summary(lm(v~(p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
* (N + I(1 / N)), data = data.set))
The output looks as follows:
lm(formula = v ~ (p + I(p^2) + I(p^3) + I(p^4)) * (N + I(1/N)),
-0.0116801 -0.0007195 0.0000063 0.0007608 0.0098659
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.089136 0.235380 136.33 <2e-16 ***
p -88.232888 0.655760 -134.55 <2e-16 ***
I(p^2) 88.225183 0.674590 130.78 <2e-16 ***
I(p^3) -38.095417 0.303854 -125.37 <2e-16 ***
I(p^4) 6.014949 0.050597 118.88 <2e-16 ***
N -8.012673 0.027781 -288.42 <2e-16 ***
I(1/N) -26.082527 0.303365 -85.98 <2e-16 ***
p:N 20.034053 0.077398 258.84 <2e-16 ***
p:I(1/N) 75.213337 0.845166 88.99 <2e-16 ***
I(p^2):N -18.033947 0.079621 -226.50 <2e-16 ***
I(p^2):I(1/N) -79.203708 0.869434 -91.10 <2e-16 ***
I(p^3):N 7.014852 0.035863 195.60 <2e-16 ***
I(p^3):I(1/N) 36.085047 0.391618 92.14 <2e-16 ***
I(p^4):N -1.002405 0.005972 -167.85 <2e-16 ***
I(p^4):I(1/N) -6.013089 0.065212 -92.21 <2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002197 on 1995 degrees of freedom
Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
F-statistic: 1.976e+06 on 14 and 1995 DF, p-value: < 2.2e-16
As we can see the model fit is almost perfect and the estimates are very close to analytical results. This reassured me that my calculation was correct.
In this way you could use simulation to verify analytical results. Of course simulation metamodeling has much broader applications – especially when we do not have analytical results.
I hope to organize a special session during ESSA2013 dedicated to simulation metamodeling using GNU R. So if you are interested in such topics and willing to promote GNU R in simulation society you are welcome to come to Warsaw.