# Maximum Likelihood Distilled

**R on Jorge Cimentada**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We all hear about Maximum Likelihood Estimation (MLE) and we often see hints of it in our model output. As usual, doing things manually can give a better grasp on how to better understand how our models work. Here’s a very short example implementing MLE based on the explanation from Gelman and Hill (2007), page 404-405.

The likelihood is literally how much our outcome variable **Y** is compatible with our predictor **X**. We compute this measure of compatibility with the probability density function for the normal distribution. In R, `dnorm`

returns this likelihood. The plot on this website gives a very clear intuition on what `dnorm`

returns: it is literally the *height* of the distribution, or in other words, the likelihood. We of course, want the highest likelihood, as it indicates greater compatibility.

For example, assuming `parameters`

is a vector with the intercept `a`

, the coefficient `b`

and an error term `sigma`

, we can compute the likelihood for any random value of these coefficients:

loglikelihood <- function(parameters, predictor, outcome) { # intercept a <- parameters[1] # beta coef b <- parameters[2] # error term sigma <- parameters[3] # Calculate the likelihood of `y` given `a + b * x` ll.vec <- dnorm(outcome, a + b * predictor, sigma, log = TRUE) # sum that likelihood over all the values in the data sum(ll.vec) } # Generate three random values for intercept, beta and error term inits <- runif(3) # Calculate the likelihood given these three values loglikelihood( inits, predictor = mtcars$disp, outcome = mtcars$mpg ) ## [1] -11687.41

That’s the likelihood given the random values for the intercept, the coefficient and sigma. How does a typical linear model estimate the **maximum** of these likelihoods? It performs an optimization search trying out a sliding set of values for these unknowns and searches for the combination that returns the maximum:

mle <- optim( inits, # The three random values for intercept, beta and sigma loglikelihood, # The loglik function lower = c(-Inf, -Inf, 1.e-5), # The lower bound for the three values (all can be negative except sigma, which is 1.e-5) method = "L-BFGS-B", control = list(fnscale = -1), # This signals to search for the maximum rather than the minimum predictor = mtcars$disp, outcome = mtcars$mpg ) mle$par[1:2] ## [1] 29.59985346 -0.04121511

Let’s compare that to the result of `lm`

:

coef(lm(mpg ~ disp, data = mtcars)) ## (Intercept) disp ## 29.59985476 -0.04121512

In layman terms, MLE really just checks how compatible a given data point is with the outcome with the respect to a coefficient. It repeats that step many times until it finds the combination of coefficients that maximizes the outcome.

**leave a comment**for the author, please follow the link and comment on their blog:

**R on Jorge Cimentada**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.