Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is an example from Zucchini & MacDonald’s book on Hidden Markov Models for Time Series (exercise 1.3).
The data is annual counts of earthquakes of magnitude 7 or greater, which exhibits both overdispersion for a Poisson (where the mean should equal the variance) as well as serial dependence.
The aim is to fit a mixture of m independent Poisson distributions to this data, using the non-linear minimizer nlm to minimize the negative log-likelihood of the Poisson mixture.
Sounds easy right? I have not done much optimisation stuff so it took me longer than it probably should have. There’s little better for the ego then getting stuck on things by page 10.

### Constrained to unconstrained

There are two sets of parameters, the lambda parameters to the Poisson distributions and the delta mixing distribution, the latter giving the proportions of each distribution to mix.
The Poisson parameters must be greater than zero and there are m of them. The mixing distribution values must all sum to one, and the first mixing value is implied by the m-1 subsequent values.
These are the constraints (lambda’s greater than zero, delta’s sum to one), and there are 2m – 1 values (m lambdas, m-1 deltas). We need to transform these to unconstrained values for use with nlm (becoming eta and tau respectively).
The formulas for transformation are reproduced from the book here

These transformed values are combined in to one parameter vector and passed to nlm. We also pass a function that calculates the negative log likelihood.
In the function, we first convert the parameters from the unconstrained “working” form to the constrained “natural” form using the inverse transform, and then use these values to do the likelihood calculation.

You can see the code here.

### Outro

Now that I look over the code and write this out it does seem all fairly straightforward and relatively trivial, but it was new to me and took a while, so I figured other people might find the example useful as well.
The code has three examples for the cases m = 2, 3, and 4. Obviously feel free to mess around with the initial parameter estimates to see what comes up.

It does seem to match the fitted models from the book, which I have copied here:

For all the details, check out the book, chapter 1!