Maximum Likelihood versus Goodness of Fit

November 8, 2013
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Thursday, I got an interesting question from a colleague of mine (JP). I mean, the way I understood the question turned out to be a nice puzzle (but I have to confess I might have misunderstood). The question is the following : consider a i.i.d. sample http://latex.codecogs.com/gif.latex?\{X_1,\cdots,X_n\} of continuous variables. We would like to choose between two (parametric) families for the distribution, http://latex.codecogs.com/gif.latex?\mathcal{F}=\{F_{\boldsymbol%20\alpha};\boldsymbol%20\alpha\in\mathcal{A}\} and http://latex.codecogs.com/gif.latex?\mathcal{G}=\{G_{\boldsymbol%20\beta};\boldsymbol%20\beta\in\mathcal{B}\}. If we use maximum likelihood techniques, we get two estimators, one for each family, http://latex.codecogs.com/gif.latex?\widehat{\boldsymbol%20\alpha} and http://latex.codecogs.com/gif.latex?\widehat{\boldsymbol%20\beta}. Clearly, http://latex.codecogs.com/gif.latex?F_{\widehat{\boldsymbol%20\alpha}}(\cdot) is a much better than http://latex.codecogs.com/gif.latex?G_{\widehat{\boldsymbol%20\beta}}(\cdot), in the sense of a standard goodness of fit test (e.g. Kolmogorov-Smirnov since the sample is assumed to be obtained from a continuous variable). Does that mean that family is http://latex.codecogs.com/gif.latex?\mathcal{F} (somehow) better than family http://latex.codecogs.com/gif.latex?\mathcal{G} ?

This is my interpretation of the question, and I found it amusing. So I will try to show (using simulated samples) that some odd situations can easily be obtained.

Consider a sample from a mixture of log-normal distributions,

>  set.seed(228)
>  X=exp(c(rnorm(50,1,1),rnorm(50,2,1.2)))

Consider two standard families for positive random variables: a Gamma distribution and a lognormal distribution.

> library(MASS)
> ab=fitdistr(X,"gamma")
> ms=fitdistr(X,"lognormal")

If we want to visualized those two distributions, let us use

> vab=pgamma(u,ab$estimate[1],ab$estimate[2])
> vms=plnorm(u,ms$estimate[1],ms$estimate[2])
> plot(ecdf(X))
> lines(u,vab,col="red")
> lines(u,vms,col="blue")

Here, we get

What else can we say ? actually, we can also compute Kolmogorov-Smirnov statistic,

http://latex.codecogs.com/gif.latex?D_n=\sup_x%20|\widehat%20F_n(x)-F_\star(x)|where

http://latex.codecogs.com/gif.latex?\widehat%20F_n(x)={1%20\over%20n}\sum_{i=1}^n%20\boldsymbol{1}_{X_i\leq%20x}

This can be done using

> ks.test(X,"plnorm",ms$estimate[1],ms$estimate[2])

One-sample Kolmogorov-Smirnov test

data:  X
D = 0.0693, p-value = 0.7231
alternative hypothesis: two-sided

> ks.test(X,"pgamma",ab$estimate[1],ab$estimate[2])

One-sample Kolmogorov-Smirnov test

data:  X
D = 0.148, p-value = 0.02507
alternative hypothesis: two-sided

From a theoretical point of view, we should not look at the p-values, since the null-distribution is based on a fixed distribution, not a fitted one (see the Lilliefors tests for normal samples). But still. The Gamma distribution seems to be very far away from the true distribution. The statistics is twice the one we have with our lognormal distribution. And one p-value is 72%, while the other one is 2.5%. Here, we should prefer this lognormal distribution to that Gamma one. But here, we did consider only one distribution in each family. Does that mean that we cannot find one Gamma distribution that will be better than all possible lognormal distributions ? Better, for instance, according to Kolmogorov-Smirnov statistics…

Well, it is possible to use another strategy to find appropriate parameters. We can minimize this statistic actually. Define

> ks1=function(ms) {m=ms[1];s=ms[2];ks.test(X,"plnorm",m,s)$statistic}
> ks2=function(ab) {a=ab[1];b=ab[2];ks.test(X,"pgamma",a,b)$statistic}

and compute

> n1=nlm(ks1,c(ms$estimate[1],ms$estimate[2]))
> n1
$minimum
[1] 0.05252692

$estimate
[1] 1.547437 1.121864
> n2=nlm(ks2,c(ab$estimate[1],ab$estimate[2]))
> n2
$minimum
[1] 0.04737725

$estimate
[1] 1.1449041 0.167041

So here, it is possible to find a distribution much closer to the empirical sample, within the Gamma family actually.

>  vab=pgamma(u,n2$estimate[1],n2$estimate[2])
>  vms=plnorm(u,n1$estimate[1],n1$estimate[2])
>  lines(u,vab,col="red",lwd=2)
>  lines(u,vms,col="blue",lwd=2)

What would be the point here? Maybe just the idea that the maximum likelihood estimator is only one estimator among a lot of them. And if it has interesting asymptotic properties, on small samples, it might not be the best estimator to consider…

And to be completely honest, I’ve been cheating here… I mean, not really cheating (not more than any researcher using a statistical test to validate the findings). But here, I did fix the seed of the random number generator. Actually, such example does not occur that frequently. Here, out of 1000 samples, I got this odd conclusion almost 15 times. And the smaller the sample, the more likely we can observe that story, where the maximum likelihood estimator can be far away from the best fit. Here is the proportion of opposite conclusions, as a function of the sample size,

> SIM=function(ns=1000,n=100){
+ t=0
+ for(s in 1:ns){
+  set.seed(s)
+  X=exp(c(rnorm(n/2,1,1),rnorm(n/2,2,1.2)))
+  ks1=function(ms) {m=ms[1];s=ms[2];ks.test(X,"plnorm",m,s)$statistic}
+  ks2=function(ab) {a=ab[1];b=ab[2];ks.test(X,"pgamma",a,b)$statistic}
+  library(MASS)
+  ab=fitdistr(X,"gamma")
+  ms=fitdistr(X,"lognormal")
+  n1=nlm(ks1,c(ms$estimate[1],ms$estimate[2]))
+  n2=nlm(ks2,c(ab$estimate[1],ab$estimate[2]))
+  if( (ks.test(X,"plnorm",ms$estimate[1],ms$estimate[2])$statistic-
+  ks.test(X,"pgamma",ab$estimate[1],ab$estimate[2])$statistic)
+ *(n1$minimum-n2$minimum)<=0 ) t=t+1
+ }
+ return(t/ns)}

> VM=rep(NA,20)
> VS=seq(10,200,by=10)
> for(i in 1:20){VM[i]=SIM(n=VS[i],ns=1000)}
> plot(VS,VM,type="p")

So to provide a more complete answer to JP’s question, with a very large sample, I guess that your intuition should be valid…. but clearly not on a small sample.

To leave a comment for the author, please follow the link and comment on his blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.