# Nonlinear constraints with a modified constrOptim

September 7, 2015
By

(This article was first published on A blog from Sydney, and kindly contributed to R-bloggers)

I often use constrOptim to quickly solve nonlinear optimization problems. constrOptim works well as a general tool to tackle constrained problems like
$\min_{Ax -b \geq0} f(x)$
There are many other options and packages for specific problems but constrOptim is likely to be the first choice when little is known on the problem at hand or for exploratory/quick-and-dirty analysis.

The main problem I have with it is the nature of the constraints: they must be linear (written in matrix form as $$Ax-b≥ 0$$). Often I would like to compute (local) solutions of more general problems of the form
$\min_{g(x) -b \geq0} f(x)$
where g(x) takes vector values, one component for each scalar constraint, and is in general nonlinear, i.e., $$g(x)$$ cannot be written as $$Ax$$.

Hence, I tweaked the code of constrOptim replacing all the occurrences of the linear constraints with my “new” non linear constraints. (In order to do that, I had to discard the chance to use “BFGS” as a solving method, for now…)  The result is a function, called constrOptimNL, see the code below, which can handle non linear constraints if a vector function g(x) and b are provided. All the drawbacks of the old function are inherited, that’s life, but now non linear problems can be solved as shown by the following simple but non entirely trivial example in two dimensions.

 f <- function(x,y) x^2*y+x-x*y   fb <- function(x) f(x[1],x[2]) #vectorial version of f   x <- seq(0,3,length=31)   y <- seq(0,3,length=31)   z <- outer(x,y,f)   #draws a graph   image(x,y,z);contour(x,y,z,add=T)   polygon(c(0,0,3),c(0,3,0),col="white",density=20)  
 Heatmap and contour lines of the objective function, the feasible domain is filled in white.

Consider the constraints $$x\geq0,y\geq 1, x+y\leq 3$$, the feasible domain is a triangle with vertices (0,0), (0,3), (3,0): due to linearity, the problem can be solved with

 A <- matrix(c(1,0,-1,0,1,-1),3,2)   b <- c(0,1,-3)   res <- constrOptim(c(0.5,1.5),fb,NULL,A,b)   res
$par[1] 0.2792393 2.7207607$value[1] -0.2683538$countsfunction gradient246 NA  ... The same problems can be solved with constrOptimNL, which trivially can handle linear constraints as well:  g <- function(x,y) c(x,y,-x-y) gb <- function(x) g(x[1],x[2]) source("constrOptimNL.R") resNL <- constrOptimNL(c(0.5,1.5),fb,NULL,gb,b) resNL $par[1] 0.2792393 2.7207607$value[1] -0.2683538$countsfunction gradient246 NA
... 

Assume now that some non linear constraint is involved in the optimization problem: $$x\geq0,y\geq1,(x-1)^2+(y-1)^2\leq1$$, the feasible domain is now the upper half-circle centered at (1,1) with unit radius. This problem has nonlinearities in the constraints and cannot be solved with the standard constrOptim.

 g <- function(x,y) c(x,y,-(x-1)^2-(y-1)^2)   b <- c(0,1,-1)   resNL <- constrOptimNL(c(0.5,1.5),fb,NULL,gb,b)   resNL$par  [1] 0.2649931 1.6780596  A graph shows that this is indeed the correct solution:  alpha <- seq(0,pi,len=31) image(x,y,z); contour(x,y,z,add=T) polygon(cos(alpha)+1,sin(alpha)+1,col="white",density=20) #draws the half-circle points(resNL$par[1],resNL$par[2])   Contour levels and a nonlinear feasible domain in white (the solution is plotted with a filled circle). You can check the solution more formally substituting the nonlinear constraint in polar coordinates, $$x=\cos\alpha+1,y=\sin\alpha+1$$:  f2 <- function(alpha) f(cos(alpha)+1,sin(alpha)+1) curve(f2,0,pi) sol <- optimize(f2,c(0,3)) points(sol$minimum,sol$objective) c(cos(sol$minimum)+1,sin(sol$minimum)+1) #"same" as resNL   Objective function restricted to the half-circle ### The code The code for constrOptimNL is given below (in this version the parameter grad is kept for compatibility with constrOptim but will never be used: Nelder-Mead and SANN are derivatives free and invoking BFGS stops the function). Hope this helps.  constrOptimNL <- function (theta, f, grad, g, ci, mu = 1e-04, control = list(), method = if (is.null(grad)) "Nelder-Mead" else "BFGS", outer.iterations = 100, outer.eps = 1e-05, ..., hessian = FALSE) { if(method=="BFGS") stop("method BFGS not available, use Nelder-Mead or SANN") if(!is.null(control$fnscale) && control$fnscale < 0) mu <- -mu R <- function(theta, theta.old, ...) { ui.theta <- g(theta,...) gi <- ui.theta - ci if (any(gi < 0)) return(NaN) gi.old <- g(theta.old,...) - ci bar <- sum(gi.old * log(gi) - ui.theta) if (!is.finite(bar)) bar <- -Inf f(theta, ...) - mu * bar } dR <- function(theta, theta.old, ...) { ui.theta <- g(theta,...) gi <- drop(ui.theta - ci) gi.old <- drop(g(theta.old,...) - ci) dbar <- colSums(ui * gi.old/gi - ui) grad(theta, ...) - mu * dbar } if (any(g(theta,...) - ci <= 0)) stop("initial value is not in the interior of the feasible region") obj <- f(theta, ...) r <- R(theta, theta, ...) fun <- function(theta, ...) R(theta, theta.old, ...) gradient <- if (method == "SANN") { if (missing(grad)) NULL else grad } else function(theta,...) dR(theta, theta.old, ...) totCounts <- 0 s.mu <- sign(mu) for (i in seq_len(outer.iterations)) { obj.old <- obj r.old <- r theta.old <- theta a <- optim(theta.old, fun, gradient, control = control, method = method, hessian = hessian, ...) r <- a$value       if (is.finite(r) && is.finite(r.old) && abs(r - r.old) <          (0.001 + abs(r)) * outer.eps)          break       theta <- a$par totCounts <- totCounts + a$counts       obj <- f(theta, ...)       if (s.mu * obj > s.mu * obj.old)          break     }     if (i == outer.iterations) {       a$convergence <- 7 a$message <- gettext("Barrier algorithm ran out of iterations and did not converge")     }     if (mu > 0 && obj > obj.old) {       a$convergence <- 11 a$message <- gettextf("Objective function increased at outer iteration %d",          i)     }     if (mu < 0 && obj < obj.old) {       a$convergence <- 11 a$message <- gettextf("Objective function decreased at outer iteration %d",          i)     }     a$outer.iterations <- i a$counts <- totCounts     a$barrier.value <- a$value     a$value <- f(a$par, ...)     a$barrier.value <- a$barrier.value - a\$value     a   }  

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...