**Yet Another Blog in Statistical Computing » S+/R**, and kindly contributed to R-bloggers)

In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity measures. However, the challenge remains in the stress testing exercise, e.g. CCAR, to relate operational losses to macro-economic scenarios denoted by a set of macro-economic attributes.

As a result, a more sensible approach employed in the annual CCAR exercise to model operational losses might be the regression-based modeling approach, which can intuitively link the severity measure of operational losses to macro-economic drivers with a explicit functional form within the framework of Generalized Linear Models (GLM). While 2-parameter Pareto distribution and 3-parameter Burr distribution are theoretically attractive, their implmentations in the regression setting could become difficult and even impractical without the availability of off-shelf modeling tools and variable selection routines. In such situation, Log Normal and Gamma distributional assumptions are much more realistic with successful applications in actuarial practices. For details, please see “Severity Distributions for GLMs” by Fu and Moncher in 2004.

While both Log Normal and Gamma are most popular choices for the severity model, there are pros and cons in each respectively. For instance, while Log Normal distributional assumption is extremely flexible and easy to understand, the predicted outcomes should be adjusted for the estimation bias. Fortunately, both SAS, e.g. SEVERITY PROCEDURE, and R, e.g. fitdistrplus package, provide convenient interfaces for the distribution selection procedure based on goodness-of-fit statistics and information criterion.

library(fitdistrplus) library(insuranceData) Fit1 <- fitdist(AutoCollision$Severity, dist = "lnorm", method = "mme") Fit2 <- fitdist(AutoCollision$Severity, dist = "gamma", method = "mme") gofstat(list(Fit1, Fit2)) #Goodness-of-fit statistics # 1-mme-lnorm 2-mme-gamma #Kolmogorov-Smirnov statistic 0.1892567 0.1991059 #Cramer-von Mises statistic 0.2338694 0.2927953 #Anderson-Darling statistic 1.5772642 1.9370056 # #Goodness-of-fit criteria # 1-mme-lnorm 2-mme-gamma #Aikake's Information Criterion 376.2738 381.2264 #Bayesian Information Criterion 379.2053 384.1578

In the above output, Log Normal seems marginally better than Gamma in this particular case. Since either Log(SEVERITY) in Log Normal or SEVERITY in Gamma belongs to exponential distribution family, it is convenient to employ GLM() with related variable selection routines in the model development exercise.

summary(mdl1 <- glm(log(Severity) ~ -1 + Vehicle_Use, data = AutoCollision, family = gaussian(link = "identity"))) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #Vehicle_UseBusiness 5.92432 0.07239 81.84 <2e-16 *** #Vehicle_UseDriveLong 5.57621 0.07239 77.03 <2e-16 *** #Vehicle_UseDriveShort 5.43405 0.07239 75.07 <2e-16 *** #Vehicle_UsePleasure 5.35171 0.07239 73.93 <2e-16 *** summary(mdl2 <- glm(Severity ~ -1 + Vehicle_Use, data = AutoCollision, family = Gamma(link = "log"))) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #Vehicle_UseBusiness 5.97940 0.08618 69.38 <2e-16 *** #Vehicle_UseDriveLong 5.58072 0.08618 64.76 <2e-16 *** #Vehicle_UseDriveShort 5.44560 0.08618 63.19 <2e-16 *** #Vehicle_UsePleasure 5.36225 0.08618 62.22 <2e-16 ***

As shown above, estimated coefficients are very similar in both Log Normal and Gamma regressions and standard erros are different due to different distributional assumptions. However, please note that predicted values of Log Normal regression should be adjusted by (RMSE ^ 2) / 2 before applying EXP().

**leave a comment**for the author, please follow the link and comment on their blog:

**Yet Another Blog in Statistical Computing » S+/R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...