MRMR version 0.1.3 is now available on CRAN. This is (almost) the same version that was discussed at the CLRS two weeks ago.
MRMR – Multivariate Regression Models for Reserving- is a tool for non-life actuaries to estimate liability reserves. The emphasis is on exploratory data analysis, visualization and model diagnostics. At present, the framework is a linear model, with a normal error term. A weighting parameter may be used to account for heteroskedasticity in the error terms.
MRMR supports three S4 objects as follows:
- Triangle – A triangle object houses reserving data. MRMR is a slight departure from the traditional storage of reserving data in several respects.
- First, the time periods must all be explicit. It’s common for reserving data sets to use integers such as 12, 24, etc. to refer to development lags and 2010, 94, etc. to denote calendar periods. MRMR uses lubridate values for this purpose so that the interpretation of temporal variables is clear and unambiguous.
- Second, MRMR distinguishes between temporal variables, static measures and stochastic measures, all of which are housed within the triangle. Temporal variables describe the origin and evaluation periods, and measures refer to the non-temporal, measurable phenomena under observation. A static variable is one whose value is known with certainty. Premium or other exposure elements are examples of such. A stochastic variable is one whose value varies over time. Loss measures are stochastic. This split permits us to house all information in the same place without any confusion.
- Finally (and a bit trivially) the information is stored in the “long” format. Columns of the underlying data frame refer to variables only and not development lags or any other such information.
- The function plotTriangle shows reserving data along three dimensions. A stochastic response (the y variable) is plotted against either a temporal or measure variable (static or stochastic) and then grouped along another dimension. Several examples are presented below, but a common result is the classic line graph of cumulative losses by origin period measured against development age. The functional form allows to easily switch from cumulative to incremental response, from development age to evaluation date, etc. Further, plotTriangle can be used to plot fit lines by group. This provides a ready visual interpretation of a model.
- TriangleModel – This object stores a linear model with a single response and one grouped predictor. At present, the grouping element is always the development lag. (This is not strictly enforced, but using any other variable will likely lead to a mysterious error.) In a future release, this assumption will be relaxed and additional grouping elements will be permitted. Further, the use of glm’s will be introduced. The TriangleModel object facilitates several diagnostics:
- A display of the coefficients of the model is displayed. This is akin to having the “loss development factors” plotted as a probability density function. The allows one to see which factors have greater variability.
- A residual plot of residuals against predicted and also grouped by origin period, development lag and calendar period. This is the classic set of four graphs shown in Zehnwirth’s paper (and likely elsewhere).
- Serial correlation across a calendar period is also displayed. Here the residuals for comparable development lags are matched to residuals in prior calendar year periods. This allows for a statistical test of the correlation of residuals from one period to the next.
- TriangleProjection – This object uses a TriangleModel to project to a future point in time. The future point is stated either as a specific date, or a specific development interval. It’s most common for actuaries to project through a development interval.
Here’s a quick example. I’m using a triangle from the Friedland paper, which is on the CAS syllabus for the reserving exam. This data is taken from page 65.
install.packages("MRMR") library(MRMR) demo(Friedland)
This will produce a number of cool plots.
The classic cumulative by origin period:
The same using incremental data:
Incremental data by calendar period:
A model with best fit lines. (Note that this almost corresponds to the standard notion of a link ratio. At present, the default is to include an intercept. This will get cleaned up in the next release.)
Confidence intervals around model factors:
The classic four-square residual plot: