R code for Chapter 2 of Non-Life Insurance Pricing with GLM

March 13, 2012
By

(This article was first published on CYBAEA Data, and kindly contributed to R-bloggers)

We continue working our way through the examples, case studies, and exercises of what is affectionately known here as “the two bears book” (Swedish björn = bear) and more formally as Non-Life Insurance Pricing with Generalized Linear Models by Esbjörn Ohlsson and Börn Johansson (Amazon
UK |
US).

At this stage, our purpose is to reproduce the analysis from the book using the R statistical computing and analysis platform, and to answer the data analysis elements of the exercises and case studies. Any critique of the approach and of pricing and modeling in the Insurance industry in general will wait for a later article.

In the following, we will assume that the reader has a copy of the book and a working installation of R. We will be using the data.table, foreach, and ggplot2 packages which are not part of the standard distribution, so the reader should install them first (e.g. by executing `install.packages(c("data.table", "foreach", "ggplot2"), dependencies = TRUE)` from within an R session).

The beginning of the code is not so important, but here goes:

```#!/usr/bin/Rscript
## PricingGLM-2.r - Code for Chapter 2 of "Non-Life Insurance Pricing with GLM"

## @book{ohlsson2010non,
##   title={Non-Life Insurance Pricing with Generalized Linear Models},
##   author={Ohlsson, E. and Johansson, B.},
##   isbn={9783642107900},
##   series={Eaa Series: Textbook},
##   year={2010},
##   publisher={Springer Verlag}
## }
```

Example 2.5: Moped insurance continued

We continue the moped insurance example, and we use the data that we saved in our Chapter 1 session. The goal is to reproduce Table 2.7 so we start building that as a data frame after loading the data.

```## Load the data from last.
if (!exists("table.1.2"))

library("foreach")

## We are looking to reproduce table 2.7 which we start building here,
## adding columns as we go.
table.2.7 <-
data.frame(rating.factor =
c(rep("Vehicle class", nlevels(table.1.2\$premiekl)),
rep("Vehicle age",   nlevels(table.1.2\$moptva)),
rep("Zone",          nlevels(table.1.2\$zon))),
class =
c(levels(table.1.2\$premiekl),
levels(table.1.2\$moptva),
levels(table.1.2\$zon)),
stringsAsFactors = FALSE)
```

We next calculate the duration and number of claims for each level of each rating factor. We also set the contrasts for the levels, using the same idiom as in our Chapter 1 session. The foreach package is convenient to use here, but you can of course do it with a normal loop and a couple of variables.

```## Calculate duration per rating factor level and also set the
## contrasts (using the same idiom as in the code for the previous
## chapter). We use foreach here to execute the loop both for its
## side-effect (setting the contrasts) and to accumulate the sums.
new.cols <-
foreach (rating.factor = c("premiekl", "moptva", "zon"),
.combine = rbind) %do%
{
sums <- tapply(table.1.2\$dur, table.1.2[[rating.factor]], sum)
n.levels <- nlevels(table.1.2[[rating.factor]])
contrasts(table.1.2[[rating.factor]]) <-
contr.treatment(n.levels)[rank(-sums, ties.method = "first"), ]
data.frame(duration = sums, n.claims = nclaims)
}
table.2.7 <- cbind(table.2.7, new.cols)
rm(new.cols)
```

Next, we need to build the frequency and severity models separately, as in the discussion at the beginning of section 2.3.4 on page 34.

Frequency model

Amazon
UK |
US

Note here the use of the `offset()` term as opposed to the `weights=` argument we have used before. The offset is `log(dur)` because our link function is `log`.

See `help("Insurance", package = "MASS")` for a similar example and sections 7.1 (p. 189–190) and 7.3 of Modern Applied Statistics with S
(Amazon
UK |
US
)
for a (very brief) discussion.

```model.frequency <-
glm(antskad ~ premiekl + moptva + zon + offset(log(dur)),
data = table.1.2, family = poisson)

rels <- coef( model.frequency )
rels <- exp( rels[1] + rels[-1] ) / exp( rels[1] )
table.2.7\$rels.frequency <-
c(c(1, rels[1])[rank(-table.2.7\$duration[1:2], ties.method = "first")],
c(1, rels[2])[rank(-table.2.7\$duration[3:4], ties.method = "first")],
c(1, rels[3:8])[rank(-table.2.7\$duration[5:11], ties.method = "first")])
```

Severity model

There are a couple of points to note here:

1. We will model using a Gamma distribution for the errors, following the discussion on page 20 and also pages 33-34. Note the point that this is only one of several plausible candidate distributions.
2. Because we are using the Gamma distribution we need to remove the zero values from the data; we do this using the `table.1.2[table.1.2\$medskad > 0, ]` construct.
3. To reproduce the values from the book, we use the non-canonical `"log"` link function even though the canonical function (`"inverse")` gives a slightly better fit (residual deviance 5.9 versus 8.0 on 16 degrees of freedom). This follows the approach discussed in Example 2.3 on page 30.
```model.severity <-
glm(medskad ~ premiekl + moptva + zon,
data = table.1.2[table.1.2\$medskad > 0, ],
family = Gamma("log"), weights = antskad)

rels <- coef( model.severity )
rels <- exp( rels[1] + rels[-1] ) / exp( rels[1] )
## Aside: For the canonical link function use
## rels <- rels[1] / (rels[1] + rels[-1])

table.2.7\$rels.severity <-
c(c(1, rels[1])[rank(-table.2.7\$duration[1:2], ties.method = "first")],
c(1, rels[2])[rank(-table.2.7\$duration[3:4], ties.method = "first")],
c(1, rels[3:8])[rank(-table.2.7\$duration[5:11], ties.method = "first")])
```

Combining the models

Now it is trivial to combine and display the results.

```table.2.7\$rels.pure.premium <- with(table.2.7, rels.frequency * rels.severity)
print(table.2.7, digits = 2)
```
rating.factor class duration n.claims rels.frequency rels.severity rels.pure.premium
1 Vehicle class 1 9833.20 391 1.00 1.00 1.00
2 Vehicle class 2 8825.10 395 0.78 0.55 0.42
11 Vehicle age 1 1918.40 141 1.55 1.79 2.78
21 Vehicle age 2 16739.90 645 1.00 1.00 1.00
12 Zone 1 1451.40 206 7.10 1.21 8.62
22 Zone 2 2486.30 209 4.17 1.07 4.48
3 Zone 3 2888.70 132 2.23 1.07 2.38
4 Zone 4 10069.10 207 1.00 1.00 1.00
5 Zone 5 246.10 6 1.20 1.21 1.46
6 Zone 6 1369.20 23 0.79 0.98 0.78
7 Zone 7 147.50 3 1.00 1.20 1.20

2.4 Case Study: Motorcycle Insurance

The authors use the term Case Study for larger exercises. There is quite a lot to cover here, and some of the questions are more about discussing the results. We will focus on getting the results.

The Data

Obtaining the data

First we read the (fixed width) data. The column names are the same (based on Swedish language) as in the book.

```columns <- c(agarald = 2L, kon = 1L, zon = 1L, mcklass = 1L, fordald = 2L,
bonuskl = 1L, duration = 8L, antskad = 4L, skadkost = 8L)
column.classes <- c("integer", rep("factor", 3), "integer",
"factor", "numeric", rep("integer", 2))
stopifnot(length(columns) == length(column.classes))
con <- url("http://www2.math.su.se/~esbj/GLMbook/mccase.txt")
col.names = names(columns),
colClasses = column.classes,
na.strings = NULL, comment.char = "")
try(close(con), silent = TRUE)
rm(columns, column.classes, con)
mccase\$mcklass <- ordered(mccase\$mcklass)
mccase\$bonuskl <- ordered(mccase\$bonuskl)
```

We are more than a little obsessed with documenting our data, so here goes. See the book for the details.

```comment(mccase) <-
c("Title: Partial casco insurance for motorcycles from Wasa, 1994--1998",
"Source: http://www2.math.su.se/~esbj/GLMbook/mccase.txt",
comment(mccase\$agarald) <-
c("The owner's age, between 0 and 99",
"Name: Age of Owner")
comment(mccase\$kon) <-
c("Name: Gender of Owner",
"Code: M=Male",
"Code: K=Female")
comment(mccase\$zon) <-
c("Name: Geographic Zone",
"Code: 1=Central and semi-central parts of Sweden's three largest cities",
"Code: 2=suburbs and middle-sized towns",
"Code: 3=Lesser towns, except those in 5 or 7",
"Code: 4=Small towns and countryside, except 5--7",
"Code: 5=Northern towns",
"Code: 6=Northern countryside",
"Code: 7=Gotland (Sweden's largest island)")
comment(mccase\$mcklass) <-
c("Name: MC Class",
"Description: A classification by the EV ratio, defined as (engine power in kW × 100) / (vehicle weight in kg + 75), rounded to the nearest lower integer.",
"Code: 1=EV ratio <= 5",
"Code: 2=EV ratio 6--8",
"Code: 3=EV ratio 9--12",
"Code: 4=EV ratio 13--15",
"Code: 5=EV ratio 16--19",
"Code: 6=EV ratio 20--24",
"Code: 7=EV ratio >= 25")
comment(mccase\$fordald) <-
c("Vehicle age, between 0 and 99",
"Name: Vehicle Age")
comment(mccase\$bonuskl) <-
c("Name: Bonus Class",
"Description: A driver starts with bonus class 1; for each claim-free year the bonus class is increased by 1. After the first claim the bonus is decreased by 2; the driver cannot return to class 7 with less than 6 consecutive claim free years.")
comment(mccase\$duration) <-
c("Name: Duration",
"Comment: The number of policy years",
"Unit: year")
c("Name: Number of Claims")
c("Name: Cost of Claims",
"Unit: SEK")
```

Rating factors

We finally add the rating factors from the book.

```mccase\$rating.1 <- mccase\$zon
mccase\$rating.2 <- mccase\$mcklass
mccase\$rating.3 <-
cut(mccase\$fordald, breaks = c(0, 1, 4, 99),
labels = as.character(1:3), include.lowest = TRUE,
ordered_result = TRUE)
mccase\$rating.4 <- ordered(mccase\$bonuskl) # Drop comments
levels(mccase\$rating.4) <-                 # Combine levels
c("1", "1", "2", "2", rep("3", 3))
```

Save the data

Never forget to save.

`save(mccase, file = "mccase.RData")`

Now we can start tackling the exercises.

Problem 1: Aggregate to cells of current tariff

Aggregate the data to the cells of the current
tariff. Compute the empirical claim frequency and severity at this
level.

We really like the `data.table` package. The syntax is much easier on the eye (and the hand). As a bonus, it scales well to much larger data sets than `data.frame`. If you are not already using it, now is the time to start.

```if (!exists("mccase"))

## Conver to data.table
library("data.table")
mccase <- data.table(mccase, key = paste("rating", 1:4, sep = "."))

## Aggregate to levels of current rating factors
mccase.current <-
mccase[,
list(duration = sum(duration),
num.policies = .N),
by = key(mccase)]

## Claim frequency and severity. Change NaN to NA.
mccase.current\$claim.freq <-
with(mccase.current, ifelse(duration != 0, antskad / duration, NA_real_))
mccase.current\$severity <-

## Save
save(mccase.current, file = "mccase.current.RData")
```

Problem 2: Determine how the duration and number of claims is distributed

Determine how the duration and number of claims is
distributed for each of the rating factor classes, as an indication
of the accuracy of the statistical analysis.

This is one of those ‘there is no right answer’ questions, but let us have a look at the data. First load it if needed and also load the libraries we will be using.

```## Load data if needed
library("data.table")
if (!exists("mccase"))
if (!exists("mccase.current"))
if (!is(mccase, "data.table"))
mccase <- data.table(mccase, key = paste("rating", 1:4, sep = "."))

library("grid")
library("ggplot2")
```

Let us start with the number of claims for the undelying data. First we plot the number of policies for each number of claims (range of number of claims is {0, 1, 2}); this is one way to do it:

```plot.titles <- c("Geo. zone", "MC class", "Vehicle age", "Bonus class")
plots <-
lapply(1:4,
function(i)
+ geom_histogram(aes(weight = duration))
+ scale_x_discrete(limits = c(0, 2))
+ scale_y_log10()
+ facet_grid(paste("rating.", i, " ~ .", sep = ""),
scales = "fixed")
## We drop the axis titles to make more room for the data
+ opts(axis.title.x = theme_blank(), axis.title.y = theme_blank(),
axis.text.x = theme_blank(),  axis.text.y = theme_blank(),
axis.title.y = theme_blank(), axis.ticks = theme_blank(),
title = plot.titles[i])
)

grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow = 1, ncol = 4)))
## We can ignore the warnings from displaying the plots for now
for (i in 1:4)
print(plots[[i]], vp = viewport(layout.pos.col = i))
```

There is certainly some variation there, especially on the first rating factor.

Recall from the discussion on page 18 the assumption that the
number of claims for a single policy is a Poisson process and the
distribution of the number of claims in a tariff cell is Poisson
distributed. Let us first look at the whole data set (and note
that the data.table notation is used; if using data.frame be sure
to have drop=FALSE):

```data <- mccase[order(antskad), list(N = .N, w = sum(duration)), by = antskad]
M <- glm(N ~ antskad, family = poisson(), weights = w, data = data)
data\$predicted <- round(predict(M, data[, list(antskad)], type = "response"))
print(data[, list(antskad, N, predicted)], digits = 1)
```
1 0 63878 63878.0
2 1 643 646.0
3 2 27 7.0

Not too bad, but then it can’t really be with that heavy weighting
to the first value of antskad. We can show the same split by the
levels of the first rating factor as an example:

```data <-
list(N = .N, w = sum(duration)),
modelPoisson <- function(data) {
M <- glm(N ~ antskad, family = poisson(), weights = w, data = data)
return(M)
}
predictionPoisson <- function (data) {
M <- modelPoisson(data)
p <- predict(M, data[, list(antskad)], type = "response")
return(p)
}
data\$predicted <-
unlist(lapply(levels(data\$rating.1),
function (l) round(predictionPoisson(data[rating.1 == l]))))
print(data[, list(rating.1, antskad, N, predicted)], digits = 1)
```
1 1 0 8409 8409.0
2 1 1 163 164.0
3 1 2 10 3.0
4 2 0 11632 11632.0
5 2 1 157 157.0
6 2 2 5 2.0
7 3 0 12604 12604.0
8 3 1 113 113.0
9 3 2 5 1.0
10 4 0 24626 24626.0
11 4 1 184 185.0
12 4 2 6 1.0
13 5 0 2368 2368.0
14 5 1 9 9.0
15 6 0 3867 3867.0
16 6 1 16 16.0
17 6 2 1 0.0
18 7 0 372 372.0
19 7 1 1 1.0

You get the idea….

If we can’t really see very much at the individual claims level,
how does it appear when we look at the levels aggregated to the
current tariff cells? Here, of course, we have much more data with
up to 33 claims in one cell, but we will limit the display to the
first few number of claims.

```high.limit <- 7L                 # Only display up to this many claims
plots <-
lapply(1:4,
function(i)
+ geom_histogram(breaks = 0:high.limit, aes(weight = duration))
+ scale_x_discrete(limits = c(0, high.limit), breaks = 0:high.limit)
## Following xlim() needed for scale_x_discrete only; see
+ xlim(0, high.limit)
+ scale_y_log10()
+ facet_grid(paste("rating.", i, " ~ .", sep = ""),
scales = "fixed")
## We drop the axis titles to make more room for the data
+ opts(axis.title.x = theme_blank(), axis.title.y = theme_blank(),
axis.text.x = theme_blank(),  axis.text.y = theme_blank(),
axis.title.y = theme_blank(), axis.ticks = theme_blank(),
title = plot.titles[i])
)

grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow = 1, ncol = 4)))
for (i in 1:4)
print(plots[[i]], vp = viewport(layout.pos.col = i))
```

There is certainly something going on for low bonus classes and
high vehicle ages that is not straightforward Poisson
distribution. Looking at the two in combination we see the old
motorcycle – low bonus class (rating.3 = 3 and rating.4 = 1) is the
main culpit, but also note that there is relatively little data in
these cells.

```ggplot(mccase.current, aes(antskad)) +
geom_histogram(breaks = 0:high.limit, aes(weight = duration)) +
scale_x_discrete(limits = c(0, high.limit), breaks = 0:high.limit) +
xlim(0, high.limit) +
scale_y_log10() +
facet_grid(rating.4 ~ rating.3, scales = "fixed", labeller = "label_both")
```

We can fit the poisson distribution using the same
predictionPoisson() function as before. We also look at the fit in
a simple way (the – much! – better approach is to use the residuals from the
fitted model)

```## rating.3 is the vehicle age.
data <-
list(N = .N, w = sum(duration)),
data\$predicted <-
unlist(lapply(levels(data\$rating.3),
function (l) round(predictionPoisson(data[rating.3 == l]))))
## Show the fit
list(rating.3, antskad, N, predicted)], digits = 1)
## Show simplistic residuals per level of the rating factor
list(res = sum(abs(predicted - N))),
by = rating.3])

## rating.4 is the bonus class
data <-
list(N = .N, w = sum(duration)),
data\$predicted <-
unlist(lapply(levels(data\$rating.4),
function (l) round(predictionPoisson(data[rating.4 == l]))))
## Show the fit
list(rating.4, antskad, N, predicted)], digits = 1)
## Show simplistic residuals per rating factor
list(res = sum(abs(predicted - N))),
by = rating.4])
```

We will not show the tables here (as we said: there is a better way). Instead, we can show the fitted Poisson distribution:

```ggplot(data, aes(x = antskad, y = N)) +
geom_bar(breaks = 0L:high.limit, stat = "identity") +
facet_wrap( ~ rating.4) +
geom_line(aes(y = predicted), data = data, colour = "red", size = 1) +
xlim(-0.5, high.limit + 0.5)
```

2. Duration

We can plot the distribution of duration as before. Here 440 is the
90% quantile for the duration (412) rounded up to the next
multiplier of binwidth.

```plots <-
lapply(1:4,
function(i)
ggplot(mccase.current, aes(duration))
+ geom_histogram(binwidth = 40)
+ xlim(0, 440)
+ scale_y_log10()
+ facet_grid(paste("rating.", i, " ~ .", sep = ""),
scales = "fixed")
## We drop the axis titles to make more room for the data
+ opts(axis.title.x = theme_blank(), axis.title.y = theme_blank(),
axis.text.x = theme_blank(),  axis.text.y = theme_blank(),
axis.title.y = theme_blank(), axis.ticks = theme_blank(),
title = plot.titles[i])
)

grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow = 1, ncol = 4)))
## We can ignore the warnings from displaying the plots for now
for (i in 1:4)
print(plots[[i]], vp = viewport(layout.pos.col = i))
```

However, in this case it is much more illuminating to look at the overall distribution:

```ggplot(mccase, aes(duration)) +
geom_histogram(binwidth = 0.05) +
xlim(0, 3) +
opts(title = "Duration of motorcycle policies") +
annotate("text", 3.0, 1e4, label = "Histogram bin width = 0.05",
size = 3, hjust = 1, vjust = 1)
```

Big spikes near 1/2 and 1 year (and in general spikes around 1/2
year multiples) suggests that either (1) this is not a random
sample of policies or (2) there is a strong seasonal effect in the

Problem 3: Determine the relativities for claim frequency and severity

Determine the relativities for claim frequency and
severity separately, by using GLMs; use the results to get

We first load the data (if needed) and create a data frame to hold the output, following the structure of Table 2.8 in the book.

```library("data.table")
if (!exists("mccase.current"))

case.2.4 <-
data.frame(rating.factor =
c(rep("Zone",        nlevels(mccase.current\$rating.1)),
rep("MC class",    nlevels(mccase.current\$rating.2)),
rep("Vehicle age", nlevels(mccase.current\$rating.3)),
rep("Bonus class", nlevels(mccase.current\$rating.4))),
class =
with(mccase.current,
c(levels(rating.1), levels(rating.2),
levels(rating.3), levels(rating.4))),
## These are the values from Table 2.8 in the book:
relativity =
c(7.678, 4.227, 1.336, 1.000, 1.734, 1.402, 1.402,
0.625, 0.769, 1.000, 1.406, 1.875, 4.062, 6.873,
2.000, 1.200, 1.000,
1.250, 1.125, 1.000),
stringsAsFactors = FALSE)
print(case.2.4, digits = 3)
```
rating.factor class relativity
1 zone 1 7.678
2 zone 2 4.227
3 zone 3 1.336
4 zone 4 1.000
5 zone 5 1.734
6 zone 6 1.402
7 zone 7 1.402
8 mc class 1 0.625
9 mc class 2 0.769
10 mc class 3 1.000
11 mc class 4 1.406
12 mc class 5 1.875
13 mc class 6 4.062
14 mc class 7 6.873
15 vehicle age 1 2.000
16 vehicle age 2 1.200
17 vehicle age 3 1.000
18 bonus class 1 1.250
19 bonus class 2 1.125
20 bonus class 3 1.000

Close enough to the book.

First we set the contrasts so the baseline for the models is the
level with the highest duration. This is the same approach we used
before.

```library("foreach")
new.cols <-
foreach (rating.factor = paste("rating", 1:4, sep = "."),
.combine = rbind) %do%
{
totals <- mccase.current[, list(D = sum(duration),
by = rating.factor]
n.levels <- nlevels(mccase.current[[rating.factor]])
contrasts(mccase.current[[rating.factor]]) <-
contr.treatment(n.levels)[rank(-totals[["D"]], ties.method = "first"), ]
data.frame(duration = totals[["D"]],
n.claims = totals[["N"]],
}
case.2.4 <- cbind(case.2.4, new.cols)
rm(new.cols)
```

We next fit the models and combine the results as in Example 2.5 above. Here we also make a note of a simple measure of the goodness of fit: the residual deviance and the degrees of freedom. The frequency model is a good fit, the severity model is not.

```## Model the frequency

model.frequency <-
rating.1 + rating.2 + rating.3 + rating.4 + offset(log(duration)),
data = mccase.current[duration > 0], family = poisson)
## Res. dev. 360 on 389 dof

rels <- coef( model.frequency )
rels <- exp( rels[1] + rels[-1] ) / exp( rels[1] )
case.2.4\$rels.frequency <-
c(c(1, rels[1:6])[rank(-case.2.4\$duration[1:7], ties.method = "first")],
c(1, rels[7:12])[rank(-case.2.4\$duration[8:14], ties.method = "first")],
c(1, rels[13:14])[rank(-case.2.4\$duration[15:17], ties.method = "first")],
c(1, rels[15:16])[rank(-case.2.4\$duration[18:20], ties.method = "first")])

## Model the severity. We stick with the non-canonical link function
## for the time being.

model.severity <-
glm(skadkost ~ rating.1 + rating.2 + rating.3 + rating.4,
family = Gamma("log"), weights = antskad)
## Res.dev. 516 on 164 dof

rels <- coef( model.severity )
rels <- exp( rels[1] + rels[-1] ) / exp( rels[1] )
case.2.4\$rels.severity <-
c(c(1, rels[1:6])[rank(-case.2.4\$duration[1:7], ties.method = "first")],
c(1, rels[7:12])[rank(-case.2.4\$duration[8:14], ties.method = "first")],
c(1, rels[13:14])[rank(-case.2.4\$duration[15:17], ties.method = "first")],
c(1, rels[15:16])[rank(-case.2.4\$duration[18:20], ties.method = "first")])

## Combine the frequency and severity
case.2.4\$rels.pure.prem <- with(case.2.4, rels.frequency * rels.severity)
```

Finally we convert, save, and compare with the current values.

```## Convert to data.table
library("data.table")
case.2.4 <- data.table(case.2.4)

## Save
save(case.2.4, file = "case.2.4.RData")

## Compare with current values
print(case.2.4[,
list(rating.factor, class, duration, n.claims,
relativity, rels.pure.prem)],
digits = 3)
```
rating.factor class duration n.claims skadkostK relativity rels.pure.prem
1 Zone 1 6205.310 183 5540.000 7.678 6.239
2 Zone 2 10103.090 167 4811.000 4.227 3.179
3 Zone 3 11676.573 123 2523.000 1.336 0.993
4 Zone 4 32628.493 196 3775.000 1.000 1.000
5 Zone 5 1582.112 9 105.000 1.734 0.155
6 Zone 6 2799.945 18 288.000 1.402 0.223
7 Zone 7 241.288 1 1.000 1.402 0.002
8 MC class 1 5190.351 46 993.000 0.625 0.395
9 MC class 2 3990.115 57 883.000 0.769 0.536
10 MC class 3 21665.679 166 5372.000 1.000 1.000
11 MC class 4 11739.882 98 2192.000 1.406 0.574
12 MC class 5 13439.926 149 3297.000 1.875 1.413
13 MC class 6 8880.134 175 4161.000 4.062 4.876
14 MC class 7 330.723 6 145.000 6.873 0.602
15 Vehicle age 1 4955.403 126 4964.000 2.000 4.715
16 Vehicle age 2 9753.811 145 5507.000 1.200 2.021
17 Vehicle age 3 50527.597 426 6570.000 1.000 1.000
18 Bonus class 1 19893.370 207 4558.000 1.250 0.797
19 Bonus class 2 9615.764 121 3627.000 1.125 0.604
20 Bonus class 3 35727.677 369 8857.000 1.000 1.000

It is really rather different from the current relativity, as we will also see below. But remember that the severity model was a poor fit: that would be my first place to start looking if staying within the insurance pricing framework outlined in this book.

Problem 4: Discussions

Just note here that the ratio for the relativity we calculated and for the existing one can be found as

```with(case.2.4, max(rels.pure.prem) / min(rels.pure.prem)) # 3552
with(case.2.4, max(relativity) / min(relativity))         #   12
```

Note the huge range of relativities in our new model (3552 versus 12).

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...