Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Just some simple codes to illustrate some points we will discuss this week, for the last course on GLMs, before the final exam.  We have mentioned that the Gamma distribution belongs to the exponential, so we can run a regression, and compute the associated AIC,

> set.seed(123)
> test.data = rgamma(n=2000, scale=1, shape=1)
> m1 = glm( test.data~1, family=Gamma(link=log))
> AIC(m1)
[1] 3997.332

The Gamma distribution is also a special case of the Tweedie distribution, with power 2

> library(statmod)
> library(tweedie)
> m2 = glm( test.data~1, family=tweedie(link.power=0, var.power=2) )
> AIC(m2)
[1] NA

Unfortunately, we cannot compute the AIC, and we need a trick (with the appropriate R function).

> AICtweedie(m2)
[1] 3997.332

Of course, we can do the same with the Poisson distribution, which also belongs to the exponential family

> test.data = rpois(n=2000, lambda=1)
> m3 = glm( test.data~1, family=poisson(link=log))
> m4 = glm( test.data~1, family=tweedie(link.power=0, var.power=1) )
> AIC(m3)
[1] 5124.61

Here, we have a problem with the AICtweedie function

> AICtweedie(m4)
[1] Inf

because we need to specify the dispersion parameter

> AICtweedie(m4, dispersion=1)
[1] 5124.61

We can now check: we generate some Gamma sample, and fit various Tweedie distribution, changing simply the variance function (which is a power function)

> set.seed(123)
> test.data = rgamma(n=2000, scale=1, shape=1)
> glmtw = function(t){
+ m1 = glm( test.data~1, family=tweedie(link.power=0, var.power=t) )
+ d = NULL
+ if(t == 1) d = 1
+ AICtweedie(m1, dispersion = d)
+
+ }
> vt = seq(1,2.7,length=100)
> vg = Vectorize(glmtw)(vt)
> plot(vt,vg,log="y",type="l")

The minimum of the AIC is close to 2, corresponding to the Gamma distribution

We can also try with a Poisson

> set.seed(123)
> test.data = rpois(n=2000, lambda=1)
> glmtw = function(t){
+ m1 = glm( test.data~1, family=tweedie(link.power=0, var.power=t) )
+ d = NULL
+ if(t == 1) d = 1
+ AICtweedie(m1, dispersion = d)
+
+ }
> vt = seq(1,2,length=100)
> vg = Vectorize(glmtw)(vt)
> plot(vt,vg,log="y",type="l")

The minimum is now close to 1, corresponding to the Poisson distriubtion (the variance is equal to the average)

Let us now try some compound Poisson distribution,

> rcpd=function(n,lambda,shape,scale){
+ N=rpois(n,lambda)
+ X=rgamma(sum(N),shape=shape, scale=scale)
+ I=as.factor(rep(1:n,N))
+ S=tapply(X,I,sum)
+ V=as.numeric(S[as.character(1:n)])
+ V[is.na(V)]=0
+ return(V)}

Let us generate some compound Poisson random variables, with Poisson distribution with average 1, and with Gamma jumps, with average and variance 1,

> set.seed(123)
> test.data = rcpd(n=2000, 1,1,1)
> glmtw = function(t){
+ m1 = glm( test.data~1, family=tweedie(link.power=0, var.power=t) )
+ d = NULL
+ if(t == 1) d = 1
+ AICtweedie(m1, dispersion = d)
+ }
> vt = seq(1.1,1.9,length=100)
> vg = Vectorize(glmtw)(vt)
> plot(vt,vg,log="y",type="l")

The optimal value for the power function is here 1.5, based on the AIC (relationships between Tweedie parameters and the compound Poisson ones are given in the slides)

We can now play a little bit with the variance of the jumps: they still have aveage 1, but they now have a smaller variance

> set.seed(123)
> test.data = rcpd(n=2000, 1,3,1/3)
> vt = seq(1.05,1.95,length=100)
> vg = Vectorize(glmtw)(vt)
> plot(vt,vg,log="y",type="l")

The optimal power for the Tweedie is closer to one, closer to the Poison case

while if we increase the variance of the jumps

> set.seed(123)
> test.data = rcpd(n=2000, 1,1/3,3)
> vt = seq(1.05,1.95,length=100)
> vg = Vectorize(glmtw)(vt)
> plot(vt,vg,log="y",type="l")

the optimal power is higher, closer to the Gamma distribution.