# Tuning optim with parscale

January 26, 2014
By

(This article was first published on R snippets, and kindly contributed to R-bloggers)

I often get questions what is the use of parscale parameter in optim procedure in GNU R. Therefore I have decided to write a simple example showing its usage and importance.

The function I test is a simplified version of estimation problem I had to solve recently. We have two explanatory variables x1 and x2, The issue is that x1 range is much smaller than x2. In the code below I control it by ratio parameter (set to 200 here). Next we have y = exp(ratio*x1)+x2. This is an explained variable - for simplicity we assume that there is no error in the model.

We take 1000 observations and want to fit the regression by minimizing sum of squared errors. I find the parameters using optim function using Nelder-Mead, BFGS and conjugate gradient methods. In the procedure maxit is set to 200000 to ensure that optimization converges (for default maxit value we would not even get algorithm to converge in some of the cases considered).

The final thing is parscale. We test two values - in the first case we leave it unmodified. In this case all variables are treated equally (although we know that unit change of x1 and x2 has significantly different influence on the objective function). In the second case we hint optim to rescale x1 by ratio. Now unit change of rescaled x1 and x2 have approximately equal influence on the objective function. We want to compare number of function and gradient evaluations in all 6 considered scenarios.

Here is the code that does all the calculations. The printed output is given in the comment at the end of the code:

f <- function(a) { exp(a[1] * x1) + a[2] * x2 }
error <- function(a) {sum((y - f(a)) ^ 2) }

set.seed(1)
n <- 1000
ratio <- 200
a <- c(ratio, 1)
x1 <- runif(1000) / ratio
x2 <- runif(1000)
y <- f(a)

for (m in c("Nelder-Mead", "BFGS", "CG")) {
result1 <- optim(rep(1, 2), error, method = m,
control = list(maxit=200000))
result2 <- optim(rep(1, 2), error, method = m,
control = list(maxit=200000, parscale = a))
cat(format(m, justify="left", width=14), names(result1$count), "\n No parscale:", format(result1$counts, width=8),
"\n  Parscale:   ", format(result2\$counts, width=8),
"\n\n")
}
#   No parscale:    17703       NA
#   Parscale:          63       NA

#   No parscale:       90       42
#   Parscale:          65       16

#   No parscale:   448479   112123
#   Parscale:         341       69

It can be clearly seen that appropriate rescaling of parameters has very significant influence on optimization converegence speed. It is especially important in Nelder-Mead and conjugate gradient methods in our example.