**R Blogs by Pedro N. de Lima**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Simulation Metamodeling – building and using surrogate models that can approximate results from more complicated simulation models – is an interesting approach to analyze results from complicated, computationally expensive simulation models. Metamodels are useful because they can yield good approximations of the original simulation model response variables using less computational resources. For an introduction to Metamodeling, refer to (Barton 2015).

To my knowledge, no Discrete-Event Simulation (DES) software provides metamodeling capabilities, and guidance on how to actually execute metamodeling is scarce. In this post, I’ll build a Spline-based simulation metamodel. This tutorial should be useful to advanced users of Arena Simulation who would be willing to give metamodeling a try.

### Why Splines?

In my previous post, I briefly described the motivation for using metamodels to approximate simulation models results. Splines are among the useful techniques for metamodeling because: (i) they are relatively simple (they are piecewise-defined polynomials), and (ii) Unlike low-order polynomials, you can generally use them with a global sampling strategy (Barton and Meckesheimer 2006), meaning you can just sample a wide range of input values of your control variable and your model will still have a decent fit.

### Data Wrangling

Before developing our metamodel, let’s first load the simulation data and do some data wrangling. For the details on this step, please refer to my previous post.

```
library(arena2r)
library(dplyr)
library(ggplot2)
library(readr)
sim_results = arena2r::get_simulation_results(source = "2019-03-metamodeling/")
sim_results$BatchSize = readr::parse_number(as.character(sim_results$Scenario))
sim_results = subset(sim_results, Statistic == "Entity 1.NumberOut")
head(sim_results)
```

```
## Scenario Statistic Replication Value BatchSize
## 51 BatchSize200 Entity 1.NumberOut 1 9200 200
## 52 BatchSize200 Entity 1.NumberOut 2 9368 200
## 53 BatchSize200 Entity 1.NumberOut 3 9322 200
## 54 BatchSize200 Entity 1.NumberOut 4 9039 200
## 55 BatchSize200 Entity 1.NumberOut 5 9255 200
## 56 BatchSize200 Entity 1.NumberOut 6 9400 200
```

### Trying Splines

You can build a spline model with the R’s standard linear model `lm`

function. Instead of using the standard `Y ~ X`

formula, we just have to use the `bs()`

function from the `splines`

package. Thus, our formula for our spline metamodel will be `Y ~ bs(X)`

.

```
## Now using Splines:
library(splines)
# Building a Spline Model:
spline_model <-lm(Value ~ bs(BatchSize),data = sim_results)
summary(spline_model)
```

```
##
## Call:
## lm(formula = Value ~ bs(BatchSize), data = sim_results)
##
## Residuals:
## Min 1Q Median 3Q Max
## -221.646 -30.896 1.011 46.969 268.354
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9256.49 30.15 306.99 < 2e-16 ***
## bs(BatchSize)1 1312.17 102.27 12.83 < 2e-16 ***
## bs(BatchSize)2 1042.28 88.29 11.80 1.61e-15 ***
## bs(BatchSize)3 961.00 42.95 22.38 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.04 on 46 degrees of freedom
## Multiple R-squared: 0.9465, Adjusted R-squared: 0.943
## F-statistic: 271.1 on 3 and 46 DF, p-value: < 2.2e-16
```

Once you have your `spline_model`

, you can use the `predict`

function to estimate the expected value of the response variable. Estimating what will be the Expected value of the Output variable with a Batch Size of 200 units is easy as:

```
predict(spline_model,
newdata = data.frame(BatchSize = 200))
```

```
## 1
## 9256.489
```

### “Optimizing” with Splines

Now that we have a spline model that can approximate our model results, we will use this model to find an “optimal” Batch Size which maximizes our Output Variable.

```
## Defining limits:
batchlims <- range(sim_results$BatchSize)
# Generating Test Data
batch.grid<-seq(from=batchlims[1], to = batchlims[2])
# Using the metamodel:
spline_data = data.frame(BatchSize = batch.grid,
Value = predict(spline_model,
newdata = list(BatchSize=batch.grid))
)
# What is the Batch Size which "optimizes" the Output?
Optimum_BatchSize <- spline_data$BatchSize[which.max(spline_data$Value)]
Output_Value <- spline_data$Value[which.max(spline_data$Value)]
Optimum_BatchSize
```

`## [1] 331`

The suggested batch size is 331. Is this a reasonable guess, based our simulation runs? Let’s figure this out by plotting the simulation data, the spline function and the optimum value found.

```
# Let's plot again with the optimum batch size:
ggplot(sim_results, mapping = aes(x = BatchSize, y = Value)) +
geom_point() +
stat_smooth(method = lm, formula = y ~ splines::bs(x)) +
geom_vline(xintercept = Optimum_BatchSize) +
geom_text(aes(x=Optimum_BatchSize,
label="\nOptimum Batch Size", y=9700),
angle=90,
text=element_text(size=11)
) +
labs(y = "Output")
```

Yes, definitely this is a good estimate! This plot encourages one to avoid going below 300 units, and suggests that going 350 and above is not a good idea either. The interesting pattern that the spline curve suggests is that increasing Batchsize not always increases Output, and that the output loss is not symetric.

Acknowledging these non-linear relationships is one of the outcomes I value at the end of a simulation project, and I hope that metamodeling will be an useful tool to you as well. Splines are a straightforward option to interpolate results from a simulation model, but there are other options out there. Future posts might explore other alternatives such as kriging metamodels, neural nets, and other techniques.

# References

Barton, Russell R, and Martin Meckesheimer. 2006. “Metamodel-Based Simulation Optimization” 13 (06). https://doi.org/10.1016/S0927-0507(06)13018-2.

Barton, Russel R. 2015. “Tutorial: Simulation Metamodeling.” In *Proceedings of the 2015 Winter Simulation Conference*, 1765–79.

**leave a comment**for the author, please follow the link and comment on their blog:

**R Blogs by Pedro N. de Lima**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.