# The Generalized Method of Moments and the gmm package

**Mad (Data) Scientist**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a package **gmm** that does MM, actually the Generalized Method of Moments, and in this post I’ll explain how to use it (on the elementary level, at least).

Our data set will be **bodyfat**, which is included in the **mfp** package, with measurements on 252 men. The first column of this data frame, **brozek**, is the percentage of body fat, which when converted to a proportion is in (0,1). That makes it a candidate for fitting a beta distribution.

Denoting the parameters of that family by α and β, the mean and variance are α / (α + β) and α β / ((α + β)^{2} (α + β + 1)), respectively. MM is a model of simplicity — we match these to the sample mean and variance of our data, and then solve for α and β. Of course, the solution part may not be so simple, due to nonlinearities, but **gmm** worries about all that for us. Yet another tribute to the vast variety of packages available on CRAN!

In our elementary usage here, the call form is

gmm(data,momentftn,start)

where **data** is our data in matrix or vector form, **momentftn** specifies the moments, and **start** is our initial guess for the iterative solution process. Let’s specify **momentftn**:

g <- function(th,x) { t1 <- th[1] t2 <- th[2] t12 <- t1 + t2 meanb <- t1 / t12 m1 <- meanb - x m2 <- t1*t2 / (t12^2 * (t12+1)) - (x - meanb)^2 f <- cbind(m1,m2) return(f) }

This function equates population moments to sample ones, by specifying expressions that **gmm()** is to set to 0. The argument **th** here (“theta”) will be the MM estimates (at any given iteration) of the population parameters, in this case of α and β.

The function is required to specify quantities whose averages are to be set to 0. So, in the line

m1 <- meanb – x

we are saying that we want the average of the right-hand side to be 0, where **x** is our data. That has the effect of setting our current iteration’s estimate of α / (α + β) to our sample mean — exactly what MM is supposed to do. The line assigning to **m2** then does the same thing for variance.

So, let’s try all this out on the body fat data:

> gmm(g,x,c(alpha=0.1,beta=0.1)) Method twoStep Objective function value: 2.559645e-10 alpha beta 4.6714 19.9969 Convergence code = 0 > hist(bodyfat$brozek/100,xlim=c(0,1), probability=TRUE) > curve(dbeta(x,4.67,20.00),add=TRUE)

The result looks like this (apologies for the lack of refinement in this quick graph, cutting off part of the top):

x

At least visually, it seems to be a pretty good fit.

For standard errors etc., a method for the generic function **vcov()** is provided:

> gmmout <- gmm(g,x,c(alpha=0.1,beta=0.1)) > vcov(gmmout) alpha beta alpha 0.2809361 0.9606354 beta 0.9606354 3.9266874

Happy GMM-ing!

Lots more posts coming, when I have time.

**leave a comment**for the author, please follow the link and comment on their blog:

**Mad (Data) Scientist**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.