The Generalized Method of Moments and the gmm package

December 20, 2015
By

(This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers)

An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a package gmm that does MM, actually the Generalized Method of Moments, and in this post I’ll explain how to use it (on the elementary level, at least).

Our data set will be bodyfat, which is included in the mfp package, with measurements on 252 men. The first column of this data frame, brozek, is the percentage of body fat, which when converted to a proportion is in (0,1). That makes it a candidate for fitting a beta distribution.

Denoting the parameters of that family by α and β, the mean and variance are α / (α + β) and α β / ((α + β)2 (α + β + 1)), respectively. MM is a model of simplicity — we match these to the sample mean and variance of our data, and then solve for α and β. Of course, the solution part may not be so simple, due to nonlinearities, but gmm worries about all that for us. Yet another tribute to the vast variety of packages available on CRAN!

In our elementary usage here, the call form is

gmm(data,momentftn,start)

where data is our data in matrix or vector form, momentftn specifies the moments, and start is our initial guess for the iterative solution process. Let’s specify momentftn:

g <- function(th,x) {
  t1 <- th[1]
  t2 <- th[2]
  t12 <- t1 + t2
  meanb <- t1 / t12
  m1 <- meanb - x
  m2 <- t1*t2 / (t12^2 * (t12+1)) - (x - meanb)^2
  f <- cbind(m1,m2)
  return(f)
}

This function equates population moments to sample ones, by specifying expressions that gmm() is to set to 0. The argument th here (“theta”) will be the MM estimates (at any given iteration) of the population parameters, in this case of α and β.

The function is required to specify quantities whose averages are to be set to 0. So, in the line

m1 <- meanb – x

we are saying that we want the average of the right-hand side to be 0, where x is our data. That has the effect of setting our current iteration’s estimate of α / (α + β) to our sample mean — exactly what MM is supposed to do. The line assigning to m2 then does the same thing for variance.

So, let’s try all this out on the body fat data:

> gmm(g,x,c(alpha=0.1,beta=0.1))
Method
 twoStep 

Objective function value: 2.559645e-10 

 alpha beta 
 4.6714 19.9969 

Convergence code = 0 
> hist(bodyfat$brozek/100,xlim=c(0,1),
   probability=TRUE)
> curve(dbeta(x,4.67,20.00),add=TRUE)

The result looks like this (apologies for the lack of refinement in this quick graph, cutting off part of the top):

xBodyfat
At least visually, it seems to be a pretty good fit.

For standard errors etc., a method for the generic function vcov() is provided:

> gmmout <- gmm(g,x,c(alpha=0.1,beta=0.1))
> vcov(gmmout)
          alpha beta
alpha 0.2809361 0.9606354
beta 0.9606354 3.9266874

Happy GMM-ing!

Lots more posts coming, when I have time.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)