(This article was first published on

I often get questions what is the use of parscale parameter in optim procedure in GNU R. Therefore I have decided to write a simple example showing its usage and importance. **R snippets**, and kindly contributed to R-bloggers)The function I test is a simplified version of estimation problem I had to solve recently. We have two explanatory variables x1 and x2, The issue is that x1 range is much smaller than x2. In the code below I control it by ratio parameter (set to 200 here). Next we have y = exp(ratio*x1)+x2. This is an explained variable - for simplicity we assume that there is no error in the model.

We take 1000 observations and want to fit the regression by minimizing sum of squared errors. I find the parameters using optim function using Nelder-Mead, BFGS and conjugate gradient methods. In the procedure maxit is set to 200000 to ensure that optimization converges (for default maxit value we would not even get algorithm to converge in some of the cases considered).

The final thing is parscale. We test two values - in the first case we leave it unmodified. In this case all variables are treated equally (although we know that unit change of x1 and x2 has significantly different influence on the objective function). In the second case we hint optim to rescale x1 by ratio. Now unit change of rescaled x1 and x2 have approximately equal influence on the objective function. We want to compare number of function and gradient evaluations in all 6 considered scenarios.

Here is the code that does all the calculations. The printed output is given in the comment at the end of the code:

f <- function(a) { exp(a[1] * x1) + a[2] * x2 }

error <- function(a) {sum((y - f(a)) ^ 2) }

set.seed(1)

n <- 1000

ratio <- 200

a <- c(ratio, 1)

x1 <- runif(1000) / ratio

x2 <- runif(1000)

y <- f(a)

for (m in c("Nelder-Mead", "BFGS", "CG")) {

result1 <- optim(rep(1, 2), error, method = m,

control = list(maxit=200000))

result2 <- optim(rep(1, 2), error, method = m,

control = list(maxit=200000, parscale = a))

cat(format(m, justify="left", width=14), names(result1$count),

"\n No parscale:", format(result1$counts, width=8),

"\n Parscale: ", format(result2$counts, width=8),

"\n\n")

}

# Nelder-Mead function gradient

# No parscale: 17703 NA

# Parscale: 63 NA

# BFGS function gradient

# No parscale: 90 42

# Parscale: 65 16

# CG function gradient

# No parscale: 448479 112123

# Parscale: 341 69

It can be clearly seen that appropriate rescaling of parameters has very significant influence on optimization converegence speed. It is especially important in Nelder-Mead and conjugate gradient methods in our example.

To

**leave a comment**for the author, please follow the link and comment on his blog:**R snippets**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...