Connecting the dots… a quick primer on cubic splines

[This article was first published on aRsing about in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I have been working with some insurance data, which was created in a forecast process that naturally throws out year-on-year snapshots (what I have), whereas I wanted to determine intra-year positions (what I need).


The chart below show the % Developed metric, which compares the value of a claim(s) at successive points since the claim(s) happened vs the eventual settled-value of the claim(s).


Some remarks:

  1. This is the annual snapshot data (what I have).
  2. By definition this metric (value on the y axis) will start at zero and eventually trend to 100%. This is because time is needed before the final level of claims is known.
  3. This particular – made up – dataset suggests that there is a period of over-estimation, that is, this hypothetical insurer books too large a loss to his books before things drop back to 100%.

Solutions for intra-year modelling

There are a number of ways to approach this type of problem. For example, you could build a regression model and then infer the intra-year positions based on the derived regression formula.  Indeed, actuarial practice often recommends fitting a parametric model to describe the progression between successive values. Typical approaches include fitting an exponential or Weibull distribution between each data point.

In principle these approaches are well suited to the situation where the claims value builds upwards to 100% – i.e. monotonically increasing. However,  in Asia, many insurers are  cautious in their reserve booking and the example pattern above isn’t that unusual. In this situation, the parametric approaches mentioned earlier don’t fare as well. Added to this, if you pick a regression method you are partially making the decision that you may not buy-into the annual snapshot values fully; this is because any fitted model will show some level of deviation vs the input values.

No, in my specific situation I needed a method that matched the annual time points exactly – in short I was seeking an interpolation approach.

The cubic spline

The cubic spline works by fitting a series of polynomial curves of order 3 – i.e. of the form ax^3 + bx^2 + cx + d – between each of the input data points. Cubic polynomials are favoured because they are simple to handle numerically and can support flexible shapes including points of inflection. In order to ensure that the resulting curve does its job a few conditions need to hold for each of the adjacent cubic curves, namely in solving for each set of values a, b, c and d, you need:

  1. For the adjacent polynomial curves to meet at the input x values.
  2. For the slopes (i.e. the first derivative of the curves) to match at the input x values;
  3. For the curvature (i.e. the second derivative) to match at the input x values.

Solution in R

The following code shows how you can fit the cubic spline. The first few lines involve supply the known x and y values; then specifying the required output points – in this case quarterly so 3m, 6m, etc.

input.x = c(seq(from=0,to=120,by=12)) #assuming we have annual spline points
input.y = c(0,0.48,1.08,1.03,1.01,1,1,1,1,1,1) # % Developed say
reqd.x = c(seq(from=0,to=120,by=3)) #suppose we want quarterly spline output
#build basic spline:
option.1b = spline(input.x, 100* input.y,xout=reqd.x) #fits one cubic between each requid x value

#plot output - original vs fitted:
plot(input.x, 100*input.y,
main = paste("LDF Example: Spline Fit vs Actuals"),
xlab="Months Developed",
ylab="% Dev",
axis(1, at=seq(0,120,by=12), las=1)
lines(option.1b, col="red")

"Quarterly - output"),

And here’s the output chart:


The are a number of tweaks offered by the spline algorithm and, if you’re interested, you can see a fuller set of adjustments here: Splines R – Complete Code

Actuarial applications

Aside from valuations, the other place where splines get used in actuarial practice is pricing – particularly if premium rates have been derived based on analysing the experience of cohorts rather than the specifics of individuals.

For example, mortality (death) and morbidity (sickness) studies underpin many life and health insurance covers. Often the base data for these tables is constructed by analysing the experience of 5-year age cohorts – say  45 to 50 years olds over a period of time. In this type of situation, the derived premium rate for each cohort (sometimes called a model point) realistically corresponds the midpoint of that age-range. In this example, a 47.5 year old.

This is fine, but natural questions arise such as how to price for a 48.5 year old? Further, if you don’t smooth the prices from one age to the next, how do you handle the potential loss of business when the insurance renewal is issued due to the potential step-jump in premiums as people move from one cohort to the next – i.e. our 50 year old falls into the 51-55 bucket…

Splines can help with these problems and they’re a lot nicer (and quicker) than the graduation-type techniques that some of us older actuaries had to study way back when…

Anyway, do leave a comment if you have any observations or would like to share your experiences.

To leave a comment for the author, please follow the link and comment on their blog: aRsing about in R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)