[This article was first published on Posts on Tychobra, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When I was fresh out of college, my boss asked me to run some simulations in R. This was my first exposure to R, and I was initially skeptical of its practical usefulness. But with some Googling and a little perseverence, I overcame the initial frustrations and am now about the biggest R fanboy around. It turns out that random simulations served as a nice introdution to R because they allowed me to easily produce useful output and they allowed me to work with R’s data structures without ever even having to load data into R.

In this tutorial we will run simulations very similar to the simulations I ran when first learning R (except the code will be a lot nicer as I was a complete noob when I wrote the original simulations). Our simulations will model insurance claims.

### Step 1: Structure the Simulation to the Problem

Before we execute any R code, we need to describe the problem a little more fully. Let’s imagine that we own an insurance company and we are writing 100 identical insurance policies next year. We want to measure the losses we expect to pay on claims resulting from these policies. Our actuary tells us the probability of a single policy having a claim is 10%. Therefore we expect an average of 10 claims (10 = 100 * 0.1). We can use a discrete probability distribution to simulate the frequency (number of claims) based on this 10 claim average.

We then need to simulate a loss amount for each claim. We can use a continuous probability distribution for this simulation. Luckily our actuary already did her own historical claims analysis and informs us that claim severities follow a lognormal distribution with a log mean of 9 and a log standard deviation of 1.75 (In a real world simulation we could measure how well various distributions fit historical claims experience using several options available in R, but fitting distributions is beyond the scope of this tutorial).

### Step 2: Run a Single Frequency/Severity Observation

We will refer to the combination of a frequency simulation and the accompanying severity simulations as an observation. Here we run one observation:

# set the seed so we can reproduce the simulation
set.seed(1234)

expected_freq <- 10

# generate a single frequency from the poisson distribution
freq <- rpois(n = 1, lambda = expected_freq)

# generate freq severities.  Each severity represents the ultimate value of 1 claim
# we will use the lognormal distribution here
sev <- rlnorm(n = freq, meanlog = 9, sdlog = 1.75)

# view the results
sev
## [1] 14046.732 15195.521  2256.730  8625.924  9874.455 98712.420

In the above example our frequency was 6, and our severities for each claim were 14,047, 15,196, 2,257, 8,626, 9,874, 98,712. By summing the severities of these 6 claims we arrive at the observation total of 148,712.

### Step 3: Run Many Observations

Now that we know how to run a single observation we need to randomly generate a bunch of observations. By inspecting the results of many observations we can determine the confidence level of experiencing different loss amounts.

library(dplyr)
library(purrr)

# number of observations
n <- 1000
# generate many frequencies from the poisson distribution
freqs <- rpois(n = n, lambda = expected_freq)

# generate freq severities.  Each severity represents the ultimate value of 1 claim
obs <- purrr::map(freqs, function(freq) rlnorm(n = freq, meanlog = 9, sdlog = 1.75))

# tidy up the data
i <- 0
obs <- purrr::map(obs, function(sev) {
i <<- i + 1
data_frame(
ob = i,
sev = sev
)
})

obs <- dplyr::bind_rows(obs)
head(obs, 10)
## # A tibble: 10 x 2
##       ob    sev
##    <dbl>  <dbl>
##  1  1.00  80356
##  2  1.00  17897
##  3  1.00 340101
##  4  1.00  17256
##  5  1.00   6110
##  6  1.00   4648
##  7  1.00   1335
##  8  1.00   4504
##  9  1.00  43360
## 10  1.00   4187

### Step 4: Initial Inspection of Results

##### 4.1: Plot the distribution of oberservations
library(ggplot2)

# aggregate claim severities by observation
obs_total <- obs %>%
group_by(ob) %>%
summarise(sev = sum(sev))

ggplot(obs_total, aes(sev)) +
geom_histogram()

##### 4.2: View some quantiles and summary info
quantile(obs_total$sev, probs = c(0.5, 0.75, 0.9, 0.95, 0.99)) ## 50% 75% 90% 95% 99% ## 256185.5 435329.7 722109.2 1033543.0 1871843.3 summary(obs_total$sev)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
##     2574   141666   256185   367565   435330 10030805

### Step 5: Alter the simulation to specific needs

Assuming our initial inspection of the output in step 4 went as planned, we are ready to customize our simulation. We could adjust our simulation in a variety of different ways. Some common paths to take from here are:

##### 5.1 Capping claims at individual claim limits
# cap claims at 250K
obs <- obs %>%
dplyr::mutate(sev_250K = if_else(sev > 250000, 250000, sev))
##### 5.2 Capping observations at an aggregate stop loss
# cap each observation at an aggregate stop loss
stop_loss <- 1000000

obs_agg <- obs %>%
dplyr::group_by(ob) %>%
dplyr::summarise(
sev = sum(sev),
sev_250K = sum(sev_250K)) %>%
dplyr::mutate(
sev_agg = if_else(sev > stop_loss, stop_loss, sev),
sev_capped_agg = if_else(sev_250K > stop_loss, stop_loss, sev_250K)
)

summary(obs_agg)
##        ob              sev              sev_250K          sev_agg
##  Min.   :   1.0   Min.   :    2574   Min.   :   2574   Min.   :   2574
##  1st Qu.: 250.8   1st Qu.:  141666   1st Qu.: 141666   1st Qu.: 141666
##  Median : 500.5   Median :  256185   Median : 256186   Median : 256186
##  Mean   : 500.5   Mean   :  367565   Mean   : 282172   Mean   : 328877
##  3rd Qu.: 750.2   3rd Qu.:  435330   3rd Qu.: 383357   3rd Qu.: 435330
##  Max.   :1000.0   Max.   :10030805   Max.   :1014342   Max.   :1000000
##  sev_capped_agg
##  Min.   :   2574
##  1st Qu.: 141666
##  Median : 256186
##  Mean   : 282158
##  3rd Qu.: 383357
##  Max.   :1000000
##### 5.3: Simulate multiple types of claims

There is no reason the claims have to all be generated from the same distributions. We could simulate claims based on many different possible frequency and severity distributions. We could also add parameter risk to our frequency and severity selections. We could add deductibles, quota share reinsurance, and a variety of other alterations based our company’s needs.

### Conclusion

Simulations are a fun way to get started in R, and with a little imagination, simulations allow you to answer highly complicated questions that would be impossible to solve directly from math equations and probability density functions.

To leave a comment for the author, please follow the link and comment on their blog: Posts on Tychobra.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)