
Tomorrow, we will discuss Fisher-Tippett theorem. The idea is that there are only three possible limiting distributions for normalized versions of the maxima of i.i.d. samples
. For bounded distribution, consider e.g. the uniform distribution on the unit interval, i.e.
on the unit interval. Let
and
. Then, for all
and
,

i.e. the limiting distribution of the maximum is Weibull's.
set.seed(1) s=1000000 n=100 M=matrix(runif(s),n,s/n) V=apply(M,2,max) bn=1 an=1/n U=(V-bn)/an hist(U,probability=TRUE,,col="light green", xlim=c(-7,1),main="",breaks=seq(-20,10,by=.25)) u=seq(-10,0,by=.1) v=exp(u) lines(u,v,lwd=3,col="red")

. Let
and
, then

which means that the limiting distribution is Fréchet's.
set.seed(1) s=1000000 n=100 M=matrix((runif(s))^(-1/2),n,s/n) V=apply(M,2,max) bn=0 an=n^(1/2) U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(0,7),main="",breaks=seq(0,max(U)+1,by=.25)) u=seq(0,10,by=.1) v=dfrechet(u,shape=2) lines(u,v,lwd=3,col="red")

. Let
and
, then

i.e. the limiting distribution for the maximum is Gumbel's distribution.
library(evd) set.seed(1) s=1000000 n=100 M=matrix(rexp(s,1),n,s/n) V=apply(M,2,max) (bn=qexp(1-1/n)) log(n) an=1 U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,15,by=.25)) u=seq(-5,15,by=.1) v=dgumbel(u) lines(u,v,lwd=3,col="red")

Consider now a Gaussian
sample. We can use the following approximation of the cumulative distribution function (based on l'Hopital's rule)

as
. Let
and
. Then we can get

as
. I.e. the limiting distribution of the maximum of a Gaussian sample is Gumbel's. But what we do not see here is that for a Gaussian sample, the convergence is extremely slow, i.e., with 100 observations, we are still far away from Gumbel distribution,

and it is only slightly better with 1,000 observations,
set.seed(1) s=10000000 n=1000 M=matrix(rnorm(s,0,1),n,s/n) V=apply(M,2,max) (bn=qnorm(1-1/n,0,1)) an=1/bn U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,15,by=.25)) u=seq(-5,15,by=.1) v=dgumbel(u) lines(u,v,lwd=3,col="red")

Even worst, consider lognormal observations. In that case, recall that if we consider (increasing) transformation of variates, we are in the same domain of attraction. Hence, since
, if

then

i.e. using Taylor's approximation on the right term,

This gives us normalizing coefficients we should use here.
set.seed(1) s=10000000 n=1000 M=matrix(rlnorm(s,0,1),n,s/n) V=apply(M,2,max) bn=exp(qnorm(1-1/n,0,1)) an=exp(qnorm(1-1/n,0,1))/(qnorm(1-1/n,0,1)) U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,40,by=.25)) u=seq(-5,15,by=.1) v=dgumbel(u) lines(u,v,lwd=3,col="red")

Credit: illustration is from Maurice Sendak's popular book where the wild things are, translated in French as Max et les Maximonstres.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).