[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Daniel Hanson, QA Data Scientist, Revolution Analytics

### Introduction

As most readers are well aware, market return data tends to have heavier tails than that which can be captured by a normal distribution; furthermore, skewness will not be captured either. For this reason, a four parameter distribution such as the Generalized Lambda Distribution (GLD) can give us a more realistic representation of the behavior of market returns, and a source from which to draw random samples to simulate returns.

The R package GLDEX provides a fairly straightforward and well-documented solution for fitting a GLD to market return data.  Documentation in pdf form may be found here.  An accompanying paper written by the author of the GLDEX package, Steven Su, on is also available for download; it is a very well presented overview of the GLD, along with details and examples on using the GLDEX package.

### Brief Background

The four parameters of the GLD are, not surprisingly, λ1, λ2, λ3, and λ4.  Without going into theoretical details, suffice it to say that λ1 and λ2 are measures of location and scale respectively, and the skewness and kurtosis of the distribution are determined by λ3 and λ4.

Furthermore, there are two forms of the GLD that are implemented in GLDEX, namely those of Ramberg and Schmeiser (1974), and Freimer, Mudholkar, Kollia, and Lin (1988).  These are commonly abbreviated as RS and FMKL.

A more detailed theoretical discussion may be found in the paper by Su noted above.

### Example

To demonstrate fitting a GLD to financial returns, let’s fetch 20 years of daily closing prices for the SPY ETF which tracks the S&P 500, and then calculate the corresponding daily log returns.  Before starting, be sure that you have installed both the GLDEX and quantmod packages; quantmod is used for obtaining the market data.

require(GLDEX)
require(quantmod)

getSymbols(“SPY”, from = “1994-02-01”)
SPY.Close <- SPY[,4] # Closing prices

SPY.vector <- as.vector(SPY.Close)

# Calculate log returns
sp500 <- diff(log(SPY.vector), lag = 1)
sp500 <- sp500[-1] # Remove the NA in the first position

Next, let’s use the function in GLDEX to compute the first four moments of the data:

# Set normalise=”Y” so that kurtosis is calculated with
# reference to kurtosis = 0 under Normal distribution
fun.moments.r(sp500, normalise = “Y”)

which gives us the following:

mean       variance      skewness      kurtosis
0.0002659639  0.0001539469 -0.0954589371  9.4837879957

Now, let’s fit a GLD to the return data by using the fun.data.fit.mm(.) function:

Remark:  running the above command will result in the following message:

“There were 50 or more warnings (use warnings() to see the first 50)”

where the individual warning messages look like this:

Warning messages:
1: In beta(a, b) : NaNs produced
2: In beta(a, b) : NaNs produced
3: In beta(a, b) : NaNs produced

These warnings may be safely ignored. Now, let’s look at the contents of the spLambdaDist object:

RPRS        RMFMKL
[1,]  3.753968e-04  3.234195e-04
[2,] -4.660455e+01  2.031910e+02
[3,] -1.673535e-01 -1.694597e-01
[4,] -1.638032e-01 -1.613267e-01

What this gives is the set of estimated lambda parameters λ1 through λ4 for both the RS version (1st column) and the FMKL version (2nd column) of the GLD.

There is also a convenient plotting function in the package that will display the histogram of the data along with the density curves for the RS and FMKL fits:

fun.plot.fit(fit.obj = spLambdaDist, data = sp500, nclass = 100,param = c(“rs”, “fmkl”), xlab = “Returns”)

where fit.obj is our fitted GLD object (containing the lambda parameters), data represents the returns data sp500, nclass is the number of partitions to use for the histogram, param tells the function which models to use (we have chosen both RS and FMKL here), and xlab is the label to use for the horizontal axis.

The resulting plot is as follows here: Note that, as in the case of the lambda parameters, RPRS refers to the RS representation, and RMFMKL to that of the FMKL.

Now that we have fitted the model, we can generate simulated returns using the rgl(.) function, which will select a random sample for a given parametrization.

In order to use this function, we need to separate out the individual lambda parameters for the RS and FMKL versions of our fitted distributions; the rgl(.) function requires individual input for each lambda parameter, as we will soon see.

lambda_params_rs <- spLambdaDist[, 1]
lambda1_rs <- lambda_params_rs
lambda2_rs <- lambda_params_rs
lambda3_rs <- lambda_params_rs
lambda4_rs <- lambda_params_rs

lambda_params_fmkl <- spLambdaDist[, 2]
lambda1_fmkl <- lambda_params_fmkl
lambda2_fmkl <- lambda_params_fmkl
lambda3_fmkl <- lambda_params_fmkl
lambda4_fmkl <- lambda_params_fmkl

Now, let’s generate a set of simulated returns with approximately the same moments as what we found with our market data.  To do this, we need a large number of draws using the rgl(.) function; through some trial and error, n = 10,000,000 gets us about as close as we can with each version (RS and FKML):

# RS version:
set.seed(100)    # Set seed to obtain a reproducible set
rs_sample <- rgl(n = 10000000, lambda1=lambda1_rs, lambda2 = lambda2_rs,
lambda3 = lambda3_rs,
lambda4 = lambda4_rs,param = “rs”)

# Moments of simulated returns using RS method:
fun.moments.r(rs_sample, normalise=”Y”)

# Moments calculated from market data:
fun.moments.r(sp500, normalise=”Y”)

# FKML version:
set.seed(100)    # Set seed to obtain a reproducible set
fmkl_sample <- rgl(n = 10000000, lambda1=lambda1_fmkl, lambda2 =                                lambda2_fmkl,lambda3 = lambda3_fmkl,
lambda4 = lambda4_fmkl,param = “fmkl”)

# Moments of simulated returns using FMKL method:
fun.moments.r(fmkl_sample, normalise=”Y”)

# Moments calculated from market data:
fun.moments.r(sp500, normalise=”Y”)

Comparing results for the RS version vs S&P 500 market data, we get:

> fun.moments.r(rs_sample, normalise=”Y”)

mean         variance     skewness      kurtosis
2.660228e-04  8.021569e-05 -1.035707e-01  9.922937e+00

> fun.moments.r(sp500, normalise=”Y”)

mean      variance      skewness      kurtosis
0.0002659639  0.0001539469 -0.0954589371  9.4837879957

And for FKML vs S&P 500 market data, we get:

> fun.moments.r(fmkl_sample, normalise=”Y”)

mean      variance      skewness      kurtosis
0.0002660137  0.0001537927 -0.1042857096  9.9498480532

> fun.moments.r(sp500, normalise=”Y”)

mean      variance      skewness      kurtosis
0.0002659639  0.0001539469 -0.0954589371  9.4837879957

So, while we are reasonably close for the mean, skewness, and kurtosis in each case, we get better results for variance with the FKML version.

### Summary

By fitting a four-parameter Generalized Lambda Distribution to market data, we are able to preserve skewness and kurtosis of the observed data; this would not be possible using a normal distribution with only the first two moments available as parameters.  Kurtosis, in particular, is critical, as this captures the fat-tailed characteristics present in market data, allowing risk managers to better assess the risk of market downturns and “black swan” events.

We were able to use the GLDEX package to construct a large set of simulated returns having approximately the same four moments as that of the observed market data, from which return scenarios may be drawn for risk and pricing models, for example.

The example shown above only scratches the surface of how the GLD can be utilized in computational finance.  For a more in-depth discussion, this paper by Chalabi, Scott, and Wurtz is one good place to start.