MAT8886 Fisher-Tippett theorem and limiting distribution for the maximum
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Tomorrow, we will discuss Fisher-Tippett theorem. The idea is that there are only three possible limiting distributions for normalized versions of the maxima of i.i.d. samples . For bounded distribution, consider e.g. the uniform distribution on the unit interval, i.e. on the unit interval. Let and . Then, for all and ,
i.e. the limiting distribution of the maximum is Weibull’s.
set.seed(1) s=1000000 n=100 M=matrix(runif(s),n,s/n) V=apply(M,2,max) bn=1 an=1/n U=(V-bn)/an hist(U,probability=TRUE,,col="light green", xlim=c(-7,1),main="",breaks=seq(-20,10,by=.25)) u=seq(-10,0,by=.1) v=exp(u) lines(u,v,lwd=3,col="red")For heavy tailed distribution, or Pareto-type tails, consider Pareto samples, with distribution function . Let and , then
which means that the limiting distribution is Fréchet’s.
set.seed(1) s=1000000 n=100 M=matrix((runif(s))^(-1/2),n,s/n) V=apply(M,2,max) bn=0 an=n^(1/2) U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(0,7),main="",breaks=seq(0,max(U)+1,by=.25)) u=seq(0,10,by=.1) v=dfrechet(u,shape=2) lines(u,v,lwd=3,col="red")For light tailed distribution, or exponential tails, consider e.g. a sample of exponentially distribution variates, with common distribution function . Let and , then
i.e. the limiting distribution for the maximum is Gumbel’s distribution.
library(evd) set.seed(1) s=1000000 n=100 M=matrix(rexp(s,1),n,s/n) V=apply(M,2,max) (bn=qexp(1-1/n)) log(n) an=1 U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,15,by=.25)) u=seq(-5,15,by=.1) v=dgumbel(u) lines(u,v,lwd=3,col="red")
Consider now a Gaussian sample. We can use the following approximation of the cumulative distribution function (based on l’Hopital’s rule)
as . Let and . Then we can get
as . I.e. the limiting distribution of the maximum of a Gaussian sample is Gumbel’s. But what we do not see here is that for a Gaussian sample, the convergence is extremely slow, i.e., with 100 observations, we are still far away from Gumbel distribution,
and it is only slightly better with 1,000 observations,
set.seed(1) s=10000000 n=1000 M=matrix(rnorm(s,0,1),n,s/n) V=apply(M,2,max) (bn=qnorm(1-1/n,0,1)) an=1/bn U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,15,by=.25)) u=seq(-5,15,by=.1) v=dgumbel(u) lines(u,v,lwd=3,col="red")
Even worst, consider lognormal observations. In that case, recall that if we consider (increasing) transformation of variates, we are in the same domain of attraction. Hence, since , if
then
i.e. using Taylor’s approximation on the right term,
This gives us normalizing coefficients we should use here.
set.seed(1) s=10000000 n=1000 M=matrix(rlnorm(s,0,1),n,s/n) V=apply(M,2,max) bn=exp(qnorm(1-1/n,0,1)) an=exp(qnorm(1-1/n,0,1))/(qnorm(1-1/n,0,1)) U=(V-bn)/an hist(U,probability=TRUE,col="light green", xlim=c(-2,7),ylim=c(0,.39),main="",breaks=seq(-5,40,by=.25)) u=seq(-5,15,by=.1) v=dgumbel(u) lines(u,v,lwd=3,col="red")
Credit: illustration is from Maurice Sendak’s popular book where the wild things are, translated in French as Max et les Maximonstres.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.