# Maximum Likelihood versus Goodness of Fit

**Freakonometrics » R-english**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Thursday, I got an interesting question from a colleague of mine (JP). I mean, the way I understood the question turned out to be a nice puzzle (but I have to confess I might have misunderstood). The question is the following : consider a i.i.d. sample of continuous variables. We would like to choose between two (parametric) families for the distribution, and . If we use maximum likelihood techniques, we get two estimators, one for each family, and . Clearly, is a much better than , in the sense of a standard goodness of fit test (e.g. Kolmogorov-Smirnov since the sample is assumed to be obtained from a continuous variable). Does that mean that family is (somehow) better than family ?

This is my interpretation of the question, and I found it amusing. So I will try to show (using simulated samples) that some odd situations can easily be obtained.

Consider a sample from a mixture of log-normal distributions,

> set.seed(228) > X=exp(c(rnorm(50,1,1),rnorm(50,2,1.2)))

Consider two standard families for positive random variables: a * Gamma* distribution and a

*distribution.*

**lognormal**> library(MASS) > ab=fitdistr(X,"gamma") > ms=fitdistr(X,"lognormal")

If we want to visualized those two distributions, let us use

> vab=pgamma(u,ab$estimate[1],ab$estimate[2]) > vms=plnorm(u,ms$estimate[1],ms$estimate[2]) > plot(ecdf(X)) > lines(u,vab,col="red") > lines(u,vms,col="blue")

Here, we get

What else can we say ? actually, we can also compute Kolmogorov-Smirnov statistic,

where

This can be done using

> ks.test(X,"plnorm",ms$estimate[1],ms$estimate[2]) One-sample Kolmogorov-Smirnov test data: X D = 0.0693, p-value = 0.7231 alternative hypothesis: two-sided > ks.test(X,"pgamma",ab$estimate[1],ab$estimate[2]) One-sample Kolmogorov-Smirnov test data: X D = 0.148, p-value = 0.02507 alternative hypothesis: two-sided

From a theoretical point of view, we should not look at the p-values, since the null-distribution is based on a fixed distribution, not a fitted one (see the Lilliefors tests for normal samples). But still. The Gamma distribution seems to be very far away from the true distribution. The statistics is twice the one we have with our lognormal distribution. And one p-value is 72%, while the other one is 2.5%. Here, we should prefer *this* lognormal distribution to *that* Gamma one. But here, we did consider only one distribution in each family. Does that mean that we cannot find one Gamma distribution that will be *better* than all possible lognormal distributions ? Better, for instance, according to Kolmogorov-Smirnov statistics…

Well, it is possible to use another strategy to find appropriate parameters. We can minimize this statistic actually. Define

> ks1=function(ms) {m=ms[1];s=ms[2];ks.test(X,"plnorm",m,s)$statistic} > ks2=function(ab) {a=ab[1];b=ab[2];ks.test(X,"pgamma",a,b)$statistic}

and compute

> n1=nlm(ks1,c(ms$estimate[1],ms$estimate[2])) > n1 $minimum [1] 0.05252692 $estimate [1] 1.547437 1.121864 > n2=nlm(ks2,c(ab$estimate[1],ab$estimate[2])) > n2 $minimum [1] 0.04737725 $estimate [1] 1.1449041 0.167041

So here, it is possible to find a distribution much closer to the empirical sample, within the Gamma family actually.

> vab=pgamma(u,n2$estimate[1],n2$estimate[2]) > vms=plnorm(u,n1$estimate[1],n1$estimate[2]) > lines(u,vab,col="red",lwd=2) > lines(u,vms,col="blue",lwd=2)

What would be the point here? Maybe just the idea that the maximum likelihood estimator is only one estimator among a lot of them. And if it has interesting asymptotic properties, on small samples, it might not be *the *best estimator to consider…

And to be completely honest, I’ve been cheating here… I mean, not really *cheating* (not more than any researcher using a statistical test to validate the findings). But here, I did fix the seed of the random number generator. Actually, such example does not occur that frequently. Here, out of 1000 samples, I got this odd conclusion almost 15 times. And the smaller the sample, the more likely we can observe that story, where the maximum likelihood estimator can be far away from the best fit. Here is the proportion of opposite conclusions, as a function of the sample size,

> SIM=function(ns=1000,n=100){ + t=0 + for(s in 1:ns){ + set.seed(s) + X=exp(c(rnorm(n/2,1,1),rnorm(n/2,2,1.2))) + ks1=function(ms) {m=ms[1];s=ms[2];ks.test(X,"plnorm",m,s)$statistic} + ks2=function(ab) {a=ab[1];b=ab[2];ks.test(X,"pgamma",a,b)$statistic} + library(MASS) + ab=fitdistr(X,"gamma") + ms=fitdistr(X,"lognormal") + n1=nlm(ks1,c(ms$estimate[1],ms$estimate[2])) + n2=nlm(ks2,c(ab$estimate[1],ab$estimate[2])) + if( (ks.test(X,"plnorm",ms$estimate[1],ms$estimate[2])$statistic- + ks.test(X,"pgamma",ab$estimate[1],ab$estimate[2])$statistic) + *(n1$minimum-n2$minimum)<=0 ) t=t+1 + } + return(t/ns)} > VM=rep(NA,20) > VS=seq(10,200,by=10) > for(i in 1:20){VM[i]=SIM(n=VS[i],ns=1000)} > plot(VS,VM,type="p")

So to provide a more complete answer to JP’s question, with a very large sample, I guess that your intuition should be valid…. but clearly not on a small sample.

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.