MAT8886 Fisher-Tippett theorem and limiting distribution for the maximum

January 12, 2012
By

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

Tomorrow, we will discuss Fisher-Tippett theorem. The idea is that there are only three possible limiting distributions for normalized versions of the maxima of i.i.d. samples http://freakonometrics.blog.free.fr/public/perso5/max-00.gif. For bounded distribution, consider e.g. the uniform distribution on the unit interval, i.e. http://freakonometrics.blog.free.fr/public/perso5/max-09.gif on the unit interval. Let http://freakonometrics.blog.free.fr/public/perso5/max-10.gif and http://freakonometrics.blog.free.fr/public/perso5/max-11.gif. Then, for all http://freakonometrics.blog.free.fr/public/perso5/max-12.gif and http://freakonometrics.blog.free.fr/public/perso5/max-13.gif,

http://freakonometrics.blog.free.fr/public/perso5/max-14.gif

i.e. the limiting distribution of the maximum is Weibull's.

set.seed(1)
s=1000000
n=100
M=matrix(runif(s),n,s/n)
V=apply(M,2,max)
bn=1
an=1/n
U=(V-bn)/an
hist(U,probability=TRUE,,col="light green",
xlim=c(-7,1),main="",breaks=seq(-20,10,by=.25))
u=seq(-10,0,by=.1)
v=exp(u)
lines(u,v,lwd=3,col="red")

For heavy tailed distribution, or Pareto-type tails, consider Pareto samples, with distribution function http://freakonometrics.blog.free.fr/public/perso5/max-05.gif. Let http://freakonometrics.blog.free.fr/public/perso5/max-06.gif and http://freakonometrics.blog.free.fr/public/perso5/max-07.gif, then

http://freakonometrics.blog.free.fr/public/perso5/max-08.gif

which means that the limiting distribution is Fréchet's.

set.seed(1)
s=1000000
n=100
M=matrix((runif(s))^(-1/2),n,s/n)
V=apply(M,2,max)
bn=0
an=n^(1/2)
U=(V-bn)/an
hist(U,probability=TRUE,col="light green",
xlim=c(0,7),main="",breaks=seq(0,max(U)+1,by=.25))
u=seq(0,10,by=.1)
v=dfrechet(u,shape=2)
lines(u,v,lwd=3,col="red")

For light tailed distribution, or exponential tails, consider e.g. a sample of exponentially distribution variates, with common distribution function http://freakonometrics.blog.free.fr/public/perso5/max-01.gif. Let http://freakonometrics.blog.free.fr/public/perso5/max-02.gif and http://freakonometrics.blog.free.fr/public/perso5/max-03.gif, then

http://freakonometrics.blog.free.fr/public/perso5/max-04.gif

i.e. the limiting distribution for the maximum is Gumbel's distribution.

library(evd)
set.seed(1)
s=1000000
n=100
M=matrix(rexp(s,1),n,s/n)
V=apply(M,2,max)
(bn=qexp(1-1/n))
log(n)
an=1
U=(V-bn)/an
hist(U,probability=TRUE,col="light green",
xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,15,by=.25))
u=seq(-5,15,by=.1)
v=dgumbel(u)
lines(u,v,lwd=3,col="red")

Consider now a Gaussian http://freakonometrics.blog.free.fr/public/perso5/max-17.gif sample. We can use the following approximation of the cumulative distribution function (based on l'Hopital's rule)

http://freakonometrics.blog.free.fr/public/perso5/max-15.gif

as http://freakonometrics.blog.free.fr/public/perso5/max-16.gif. Let http://freakonometrics.blog.free.fr/public/perso5/max-18.gif and http://freakonometrics.blog.free.fr/public/perso5/max-19.gif. Then we can get

http://freakonometrics.blog.free.fr/public/perso5/max-20.gif

as http://freakonometrics.blog.free.fr/public/perso5/max-21.gif. I.e. the limiting distribution of the maximum of a Gaussian sample is Gumbel's. But what we do not see here is that for a Gaussian sample, the convergence is extremely slow, i.e., with 100 observations, we are still far away from Gumbel distribution,

and it is only slightly better with 1,000 observations,

set.seed(1)
s=10000000
n=1000
M=matrix(rnorm(s,0,1),n,s/n)
V=apply(M,2,max)
(bn=qnorm(1-1/n,0,1))
an=1/bn
U=(V-bn)/an
hist(U,probability=TRUE,col="light green",
xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,15,by=.25))
u=seq(-5,15,by=.1)
v=dgumbel(u)
lines(u,v,lwd=3,col="red")

Even worst, consider lognormal observations. In that case, recall that if we consider (increasing) transformation of variates, we are in the same domain of attraction. Hence, since http://freakonometrics.blog.free.fr/public/perso5/max-22.gif, if

http://freakonometrics.blog.free.fr/public/perso5/max-23.gif

then

http://freakonometrics.blog.free.fr/public/perso5/max-24.gif

i.e. using Taylor's approximation on the right term,

http://freakonometrics.blog.free.fr/public/perso5/max-25.gif

This gives us normalizing coefficients we should use here.

set.seed(1)
s=10000000
n=1000
M=matrix(rlnorm(s,0,1),n,s/n)
V=apply(M,2,max)
bn=exp(qnorm(1-1/n,0,1))
an=exp(qnorm(1-1/n,0,1))/(qnorm(1-1/n,0,1))
U=(V-bn)/an
hist(U,probability=TRUE,col="light green",
xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,40,by=.25))
u=seq(-5,15,by=.1)
v=dgumbel(u)
lines(u,v,lwd=3,col="red")

Credit: illustration is from Maurice Sendak's popular book where the wild things are, translated in French as Max et les Maximonstres.

To leave a comment for the author, please follow the link and comment on his blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , , , , , , , , , ,

Comments are closed.