The Generalized Method of Moments and the gmm package

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a package gmm that does MM, actually the Generalized Method of Moments, and in this post I’ll explain how to use it (on the elementary level, at least).

Our data set will be bodyfat, which is included in the mfp package, with measurements on 252 men. The first column of this data frame, brozek, is the percentage of body fat, which when converted to a proportion is in (0,1). That makes it a candidate for fitting a beta distribution.

Denoting the parameters of that family by α and β, the mean and variance are α / (α + β) and α β / ((α + β)2 (α + β + 1)), respectively. MM is a model of simplicity — we match these to the sample mean and variance of our data, and then solve for α and β. Of course, the solution part may not be so simple, due to nonlinearities, but gmm worries about all that for us. Yet another tribute to the vast variety of packages available on CRAN!

In our elementary usage here, the call form is

gmm(data,momentftn,start)

where data is our data in matrix or vector form, momentftn specifies the moments, and start is our initial guess for the iterative solution process. Let’s specify momentftn:

g <- function(th,x) {
  t1 <- th[1]
  t2 <- th[2]
  t12 <- t1 + t2
  meanb <- t1 / t12
  m1 <- meanb - x
  m2 <- t1*t2 / (t12^2 * (t12+1)) - (x - meanb)^2
  f <- cbind(m1,m2)
  return(f)
}

This function equates population moments to sample ones, by specifying expressions that gmm() is to set to 0. The argument th here (“theta”) will be the MM estimates (at any given iteration) of the population parameters, in this case of α and β.

The function is required to specify quantities whose averages are to be set to 0. So, in the line

m1 <- meanb – x

we are saying that we want the average of the right-hand side to be 0, where x is our data. That has the effect of setting our current iteration’s estimate of α / (α + β) to our sample mean — exactly what MM is supposed to do. The line assigning to m2 then does the same thing for variance.

So, let’s try all this out on the body fat data:

> gmm(g,x,c(alpha=0.1,beta=0.1))
Method
 twoStep 

Objective function value: 2.559645e-10 

 alpha beta 
 4.6714 19.9969 

Convergence code = 0 
> hist(bodyfat$brozek/100,xlim=c(0,1),
   probability=TRUE)
> curve(dbeta(x,4.67,20.00),add=TRUE)

The result looks like this (apologies for the lack of refinement in this quick graph, cutting off part of the top):

xBodyfat
At least visually, it seems to be a pretty good fit.

For standard errors etc., a method for the generic function vcov() is provided:

> gmmout <- gmm(g,x,c(alpha=0.1,beta=0.1))
> vcov(gmmout)
          alpha beta
alpha 0.2809361 0.9606354
beta 0.9606354 3.9266874

Happy GMM-ing!

Lots more posts coming, when I have time.


To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)