Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When modeling severity measurements in the operational loss with Generalized Linear Models, we might have a couple choices based on different distributional assumptions, including Gamma, Inverse Gaussian, and Lognormal. However, based on my observations from the empirical work, the differences in parameter estimates among these three popular candidates are rather immaterial from the practical standpoint.

Below is a demonstration showing how to model the severity with the insurance data under aforementioned three distributions. As shown, albeit with inferential differences, three models show similar coefficients.

```In [1]: # LOAD PACKAGES

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: import statsmodels.api as sm

In [5]: import statsmodels.formula.api as smf

Out[7]:
Age Vehicle_Use  Severity  Claim_Count
0  17-20    Pleasure    250.48           21
1  17-20  DriveShort    274.78           40
2  17-20   DriveLong    244.52           23
4  21-24    Pleasure    213.71           63

In [8]: # FIT A GAMMA REGRESSION

In [9]: gamma = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gamma(sm.families.links.log))

In [10]: gamma.fit().summary()
Out[10]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:               Severity   No. Observations:                   32
Model:                            GLM   Df Residuals:                       21
Model Family:                   Gamma   Df Model:                           10
Method:                          IRLS   Log-Likelihood:                -161.35
Date:                Sun, 06 Dec 2015   Deviance:                      0.58114
Time:                        12:59:17   Pearson chi2:                    0.629
No. Iterations:                     8
=============================================================================================
coef    std err          z      P>|z|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept                     6.2413      0.101     61.500      0.000         6.042     6.440
Age[T.21-24]                 -0.2080      0.122     -1.699      0.089        -0.448     0.032
Age[T.25-29]                 -0.2303      0.122     -1.881      0.060        -0.470     0.010
Age[T.30-34]                 -0.2630      0.122     -2.149      0.032        -0.503    -0.023
Age[T.35-39]                 -0.5311      0.122     -4.339      0.000        -0.771    -0.291
Age[T.40-49]                 -0.3820      0.122     -3.121      0.002        -0.622    -0.142
Age[T.50-59]                 -0.3741      0.122     -3.057      0.002        -0.614    -0.134
Age[T.60+]                   -0.3939      0.122     -3.218      0.001        -0.634    -0.154
Vehicle_Use[T.DriveLong]     -0.3573      0.087     -4.128      0.000        -0.527    -0.188
Vehicle_Use[T.DriveShort]    -0.5053      0.087     -5.839      0.000        -0.675    -0.336
Vehicle_Use[T.Pleasure]      -0.5886      0.087     -6.801      0.000        -0.758    -0.419
=============================================================================================
"""

In [11]: # FIT A INVERSE GAUSSIAN REGRESSION

In [12]: igauss = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.InverseGaussian(sm.families.links.log))

In [13]: igauss.fit().summary()
Out[13]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:               Severity   No. Observations:                   32
Model:                            GLM   Df Residuals:                       21
Model Family:         InverseGaussian   Df Model:                           10
Method:                          IRLS   Log-Likelihood:                -156.44
Date:                Sun, 06 Dec 2015   Deviance:                    0.0015945
Time:                        13:01:14   Pearson chi2:                  0.00183
No. Iterations:                     7
=============================================================================================
coef    std err          z      P>|z|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept                     6.1776      0.103     59.957      0.000         5.976     6.379
Age[T.21-24]                 -0.1475      0.116     -1.269      0.204        -0.375     0.080
Age[T.25-29]                 -0.1632      0.116     -1.409      0.159        -0.390     0.064
Age[T.30-34]                 -0.2079      0.115     -1.814      0.070        -0.433     0.017
Age[T.35-39]                 -0.4732      0.108     -4.361      0.000        -0.686    -0.261
Age[T.40-49]                 -0.3299      0.112     -2.954      0.003        -0.549    -0.111
Age[T.50-59]                 -0.3206      0.112     -2.866      0.004        -0.540    -0.101
Age[T.60+]                   -0.3465      0.111     -3.115      0.002        -0.565    -0.128
Vehicle_Use[T.DriveLong]     -0.3334      0.084     -3.992      0.000        -0.497    -0.170
Vehicle_Use[T.DriveShort]    -0.4902      0.081     -6.055      0.000        -0.649    -0.332
Vehicle_Use[T.Pleasure]      -0.5743      0.080     -7.206      0.000        -0.731    -0.418
=============================================================================================
"""

In [14]: # FIT A LOGNORMAL REGRESSION

In [15]: df['Log_Severity'] = np.log(df.Severity)

In [16]: lognormal = smf.glm(formula = "Log_Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gaussian())

In [17]: lognormal.fit().summary()
Out[17]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:           Log_Severity   No. Observations:                   32
Model:                            GLM   Df Residuals:                       21
Model Family:                Gaussian   Df Model:                           10
Method:                          IRLS   Log-Likelihood:                 19.386
Date:                Sun, 06 Dec 2015   Deviance:                      0.55778
Time:                        13:02:12   Pearson chi2:                    0.558
No. Iterations:                     4
=============================================================================================
coef    std err          z      P>|z|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept                     6.1829      0.096     64.706      0.000         5.996     6.370
Age[T.21-24]                 -0.1667      0.115     -1.447      0.148        -0.393     0.059
Age[T.25-29]                 -0.1872      0.115     -1.624      0.104        -0.413     0.039
Age[T.30-34]                 -0.2163      0.115     -1.877      0.061        -0.442     0.010
Age[T.35-39]                 -0.4901      0.115     -4.252      0.000        -0.716    -0.264
Age[T.40-49]                 -0.3347      0.115     -2.904      0.004        -0.561    -0.109
Age[T.50-59]                 -0.3267      0.115     -2.835      0.005        -0.553    -0.101
Age[T.60+]                   -0.3467      0.115     -3.009      0.003        -0.573    -0.121
Vehicle_Use[T.DriveLong]     -0.3481      0.081     -4.272      0.000        -0.508    -0.188
Vehicle_Use[T.DriveShort]    -0.4903      0.081     -6.016      0.000        -0.650    -0.331
Vehicle_Use[T.Pleasure]      -0.5726      0.081     -7.027      0.000        -0.732    -0.413
=============================================================================================
"""

```