# Maximum Likelihood Estimation and the Origin of Life

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Econometrics by Simulation**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Maximum likelihood Estimation (MLE) is a powerful tool in econometrics which allows for the consistent and asymptotically efficient estimation of parameters given a correct identification (in terms of distribution) of the random variable. # It is also a proceedure which seems superficially quite complex and intractible, but in reality is very intuitive and easy to use. # To introduct MLE, lets think about observing a repeated coin flip. # Let's say we know it is not a fair coin and we would like to figure out what the likehood of heads parameter popping up is. # Let's define out likelihood function this way: lk <- function(responses, theta) { # This likelihood function returns the likelihood of (a priori) observing a # particular response pattern given the likelihood of seeing a head on the # coin is theta. returner <- 0*theta # Define a vector to hold likelihoods in case theta is a vector for (i in 1:length(theta)) returner[i] <- prod(theta[i]^responses) * # The likelihood of seeing the head's pattern prod((1-theta[i])^(1-responses)) # The likelihood of seeing the tails's pattern returner } # Let's see how our likelihood function is working. # Let's see we flip the coin once and we observe a heads # and our initial guess is that the coin is fair. lk(c(1), .5) # [1] 0.5 # Our likelihood of this being true is .5 (we know this is right) # What is we get a tail? lk(c(0), .5) # [1] 0.5 # .5 as well. This seems right. # Now let's try something more interesting. # Let's say we get two heads: lk(c(1,1), .5) # [1] 0.25 # .5^2=.25 # This is standard probability. But, let's now ask: # What if the coin was not fair (as we originally guessed)? # Is there a better parameter set that would lead to our observed outcome? # Let's say, there is a 75% chance of getting heads. # What is the likelihood of seeing our 2 heads. lk(c(1,1), .75) # [1] 0.5625 # So if the coin was unfair at 75% then our likelihood of observing the # outcome we did would increase from 25% to 56%. # How about is the coin were 95% in favor of heads. lk(c(1,1), .95) # [1] 0.9025 # Now we are up to 90%. # You probably see where this is going. # If we assume we know nothing about the underlying distribution of the coin # parameter, then the parameter which fits best is 1 (100%). # Thus if we see only positive outcomes then the most likely guess at the # underlying distribution of outcomes is that every time you flip the coin it # will land heads. # We can see this mapped out. theta.vect <- seq(0,1,.05) plot(theta.vect, lk(c(1,1), theta.vect), main="HH", ylab="Probability", xlab="Theta") # It is worth noting that though there are different probabilities # for each unfairness parameter. We can only definitively rule out # one option after flipping the coin twice and getting heads. # That is that it is impossible for the likelihood of getting a head=0. # However, all other options are available. How do we choose from these options? # Well... naturally, we choose the option which maximizes the probability # of seeing the observed outcome (as the name implies). # Now let's make things a little more interesting. Let's say the third # time we flip the coin it ends up tails. # If we assume it is a fair coin then: lk(c(1,1,0), .5) # [1] 0.125 # How about our guess of .75 heads? lk(c(1,1,0), .75) # [1] 0.140625 # More likely than that of a fair coin but not as dramatic an improvement from # when we saw only two heads. # Let's see how the graph looks now. plot(theta.vect, lk(c(1,1,0), theta.vect), main="HHT", ylab="Probability", xlab="Theta") # We can see that our graph is finally beginning to look a bit more interesting. # We can see that the most likely outcome is around 65%. For those of us # a little ahead of the game the most likely probability is the success rate # or 2/3 (66.6%). # But the importance of the exercise to think about why 66.6% is parameter # we select as the most likely. lk(c(1,1,0), 2/3) # [1] 0.1481481 # Not because it is overwhelmingly the best choice. # It is only 2.3% (0.148-0.125) more likely to occur than if it were a fair coin. # So we really are not very confident with our parameter choice at this point. # However, imagine instead for one moment, if we observed the same ratio but with # 300 coins. cpattern <- c(rep(1,200), rep(0,100)) lk(cpattern, 2/3) # [1] 1.173877e-83 lk(cpattern, 1/2) # [1] 4.909093e-91 # Now, in terms of percentages the differences are extremely small. # So small that it is hard to compare. The plot can be useful: plot(theta.vect, lk(cpattern, theta.vect), main="(HHT)^100", ylab="Probability", xlab="Theta") # Let's see what happens if we increase our number of coins to # 3000 plot(theta.vect, lk(rep(cpattern,10), theta.vect), main="(HHT)^1000", ylab="Probability", xlab="Theta") # In this graph we can see the first major computational problem # when dealing with likelihoods. They get so small, they are hard # to manage in raw probabilities. In this case the digits get # rounded into 0 so that all R sees is 0. # Fortunately, the maximum of a function is the same maximum # (in terms of parameter choices) as a monotonic transformation # of a function. Thus we can rescale our probabilities # before multiplication using logs creating the log likelihood # function which produces parameters which vary in scale much less # dramatically. # I won't say anything more about this right now except that this is why # MLE functions always maximizes and reports the "log likelihood" # value rather than the "likelihood". # However, in this discussion it is worth noting that there is # a somewhat useful statistic that we can produce to compare # the likelihoods of the fair coin hypothesis with that # of the 2/3 biased hypothesis. # That is the odds ratio of the two outcomes. How much more # likely (multiplicatively) is our outcome to be observed # if the coin is unfair towards 2/3 heads rather than fair? lk(cpattern, 2/3)/lk(cpattern, 1/2) # [1] 23912304 # That is to say, the outcome in which 2/3 rds of the time # we get heads for a coin flip of 300 coins is 23 million # times more likely to occur if our coin # is unfair (66.6%) over that of being fair (50%). # This is a pretty big number and thus very unlikely to occur # relative to that of a fair coin. Comparing accross all possible # outcomes, we would find that while this ratio is not always as # large, for example if we are comparing 2/3s to .6 lk(cpattern, 2/3)/lk(cpattern, .75) # [1] 183.3873 # But it can still be quite large. In this case, we are 183 times # more likely to see the outcome we saw if we chose 2/3s as our parameter # choice compared with 3/4ths. # Looking at the raw probability we see lk(cpattern, 2/3) # [1] 1.173877e-83 # or 1 out of 8*10^82 outcomes. # Thus the likelihood of a particular event ever occurring is very small, even # given the most likely hypothesis (theta=2/3). # However, compared nearly all other hypothesis such as (theta=1/2 or 3/4) # the event is much more likely to have occurred. # And THAT is why creationists are right to say it very unlikely # in absolute terms that evolution brought about the origin of # life on earth yet are also completely wrong because compared # with all other available hypotheses that is the only one # which is remotely likely (at least from the series of # outcomes that I have observed) to have occurred makes its odds # ratio very high in my mind.