Tuning optim with parscale

[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I often get questions what is the use of parscale parameter in optim procedure in GNU R. Therefore I have decided to write a simple example showing its usage and importance.

The function I test is a simplified version of estimation problem I had to solve recently. We have two explanatory variables x1 and x2, The issue is that x1 range is much smaller than x2. In the code below I control it by ratio parameter (set to 200 here). Next we have y = exp(ratio*x1)+x2. This is an explained variable – for simplicity we assume that there is no error in the model.

We take 1000 observations and want to fit the regression by minimizing sum of squared errors. I find the parameters using optim function using Nelder-Mead, BFGS and conjugate gradient methods. In the procedure maxit is set to 200000 to ensure that optimization converges (for default maxit value we would not even get algorithm to converge in some of the cases considered).

The final thing is parscale. We test two values – in the first case we leave it unmodified. In this case all variables are treated equally (although we know that unit change of x1 and x2 has significantly different influence on the objective function). In the second case we hint optim to rescale x1 by ratio. Now unit change of rescaled x1 and x2 have approximately equal influence on the objective function. We want to compare number of function and gradient evaluations in all 6 considered scenarios.

Here is the code that does all the calculations. The printed output is given in the comment at the end of the code:

f <- function(a) { exp(a[1] * x1) + a[2] * x2 }
error <- function(a) {sum((y - f(a)) ^ 2) }

n <- 1000
ratio <- 200
a <- c(ratio, 1)
x1 <- runif(1000) / ratio
x2 <- runif(1000)
y <- f(a)

for (m in c(“Nelder-Mead”, “BFGS”, “CG”)) {
    result1 <- optim(rep(1, 2), error, method = m,
        control = list(maxit=200000))
    result2 <- optim(rep(1, 2), error, method = m,
        control = list(maxit=200000, parscale = a))
    cat(format(m, justify=”left”, width=14), names(result1$count),
        “\n  No parscale:”, format(result1$counts, width=8),
        “\n  Parscale:   “, format(result2$counts, width=8),
# Nelder-Mead    function gradient
#   No parscale:    17703       NA
#   Parscale:          63       NA

# BFGS           function gradient
#   No parscale:       90       42
#   Parscale:          65       16

# CG             function gradient
#   No parscale:   448479   112123
#   Parscale:         341       69

It can be clearly seen that appropriate rescaling of parameters has very significant influence on optimization converegence speed. It is especially important in Nelder-Mead and conjugate gradient methods in our example.

To leave a comment for the author, please follow the link and comment on their blog: R snippets.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)