A decision tree for function minimization
What R calls “optimization” is more generally known as function minimization. The tools in the stats package function optim() are all essentially function mimizers, as are those in the package optimrx found at https://r-forge.r-project.org/projects/optimizer/. optimrx tries to make it easier for users to call quite a few methods without having to learn many different sets of command syntax. It uses the same syntax as optim(), even down to parameter scaling, by changing the method=”name” argument.
There are, however, a number of methods. Which is best for your purpose? This posting presents a form of decision tree to try to help. It isn’t the last word, and you should not take it as a set of hard rules. But it does reflect a lot of experience and I have tried to give the reasoning behind some of the suggestions.
We assume there is a function we wish to minimize or maximize. Call it fn(x) where x is parameter vector of length n; we will let edata be exogenous data — those numbers or factors that are needed to compute the function but which we will not be adjusting. So we should really write fn(x, edata), but will often be lazy, just as R allows us to be with exogenous data.
Function fn(x) is assumed:
- to have an initial starting set of parameters x0 that is feasible (which I won’t define just now)
- to have more or less 1 local minimum (though we often try to solve problems with multiple minima present if these minima are believed to be sufficiently separated).
Note: if fn() is a sum of squares, then packages nlmrt or minpack.lm are preferred. The stats function nls() may also be useful. I plan to write more about nonlinear least squares in a later post. If you are desperate, then get in touch with me!
Let us proceed in a stepwise fashion.
Step 0: Presets.
If we are maximizing , create fn(x, edata) <- -fn.orig(x, edata) . That is, we
minimize the negative of the function we want to maximize.
Let us set some logical variables to define our problem:
- haveg = TRUE if there are analytic gradients of fn(x) available
- bounds = TRUE if there are lower and/or upper bounds on the parameters x
- masks = TRUE if some parameters are fixed — currently only packages Rvmmin and Rcgmin support masks well.
Step 1: Check function (This is always a REALLY good idea.)
- compute fn(x, edata) for at least x0 and verify the value is correct
- haveg TRUE: compute g(x0) and compare it with grad(fn, x0, edata) from package numDeriv
Step2: Running a method
- bounds TRUE,
- haveg FALSE: try nmkb() from package dfOptim, but note that the transfinite function approach to bounds in this method cannot have x0 — in fact any initial parameter — on a bound. bobyqa() from package minqa may be much more efficient, but can behave poorly. hjkb() from package dfOptim is very easy to understand, but sometimes stalls and generally inefficient.
- haveg TRUE:
- n < 50 (i.e., small): packages Rvmmin or nlminb
- n large: packages Rtnmin, Rcgmin, or method L-BFGS-B (from optim() in stats). Package lbfgsb3 is an implementation of the 2011 FORTRAN version of the same algorithm and in theory may be preferred, but L-BFGS-B() called via optim() or optimr() is likely more efficient. The three approaches mentioned can also be applied to small n problems, and sometimes are as efficient as the methods recommended for small n. Rtnmin and LBFGS variants are in the same general class of methods, but have many differences in implementation approach. Generally I have found that the relatively newer Rtnmin may be faster, but less reliable than, Rcgmin though the former seems to work better when the function has more “nonlinearity” (which for now I will not define). Rvmmin is, by design, very aggressive in trying to find a minimum, so will “try too hard” in many cases. As such it should not be criticised for slowness when it is intended to favour reliability over speed.
- bounds FALSE:
- haveg FALSE: Methods nmk() from dfOptim, or uobyqa() from minqa are indicated (but bobyqa() from minqa may be better as it is more up-to-date than uobyqa(); we can use very loose values for the bounds on x)
- haveg TRUE:
- n small: packages Rvmmin or ucminf , or function nlm() from stats.
- n large: method L-BFGS-B or packages Rcgmin and Rtnmin
In the above I have not mentioned function spg() from package BB. My experience is that occasionally this can be very efficient, but more often slower than other methods. However it has a particular advantage of allowing the application of quite complicated constraints.