[This article was first published on Fiddling with data and code » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The majority of the world’s problem deal with directly or indirectly some kind of optimisation. Instance of optimisation of resources or utility function can be seen our daily life. Here I am talking about standard optimisation problem in statistics, maximum likelihood estimate. This blog is a small episode of my recent work. I used R for this optimisation exercise.

There are packages available for optimisation in R. The mostly used are: optim() and  optimize() are in stats package. A dedicated function mle() available in stats4 package. This is very useful most of the mean part modeling(eg: glm). What we need for this optimisation is to prepare function for the (log)likelihood and gradient (or hessian). We can also specify the algorithm(methods) as any of the following: “Nelder-Mead”, “BFGS”, “CG”, “L-BFGS-B”, “SANN”, and   “Brent”. This opimisation output give the necessary information for the model estimation.

The case I was doing is an extension of bivariate garch model. It includes more than 20 parameters. let me focus pure garch model, garch(1,1), as a prototype to explain the algorithm. Here the model equation is in the variance part. Also the parameter has constraint to ensure the positive variance. The   model equation is given below.

$x_t = z_t \sqrt{h_t}; z_t \sim \mathcal{N}(0,1)$
$h_t = \omega + \alpha x_{t-1}^2 + \beta h_{t-1}; \omega >0 , \alpha,\beta \geq 0″ style=”float:top;” border=”0px” />

The above discussed optimisation function may work for garch(1,1) estimation. But as the number of parameters increases there is not much use with optim functions. The main reason for failing the algorithm is the nonconvex surface with sensitive boundaries. So the solution to write the codes for newton raphson algorithm or BHHH or other modified algorithms.

`## Likelihood function for GARCH(1,1) PARAM <- c(1,0,0) loglik <- function(par=PARAM,xt) { N <- length(xt) omega <- par[1] alpha <- par[2] beta <- par[3] h <- omega/(1-alpha-beta) e <- xt[1] logl <- - 1/2* log(2*pi*h) -e^2/(2*h) for(t in 2:N) { e[t] <- xt[t] h[t] <- omega+ alpha*e[t-1]^2+beta*h[t-1] logl <- logl - 1/2* log(2*pi*h[t]) -e[t]^2/(2*h[t]) } return(logl) }`

I wrote BHHH algorithm for my model. I was not able solve it fully, it was not giving the optimum. The reason is the direction for each iteration when it reaches sensitive area is not properly scaled. ie. either it overshoot in some dimension or there is no significant improvement in any direction. So to get appropriate multiplier on optimal direction fails here.

`## This function gives the improvement direction: inv(HESSIAN)*Grad ## Here method of scoring as hessian[ as per Green(2003) ] direction <- function(par=PARAM,xt,vlist=1:3) { N <- length(xt) omega <- par[1] alpha <- par[2] beta <- par[3] h <- omega/(1-alpha-beta) e <- xt[1] logl <- - 1/2* log(2*pi*h) -e^2/(2*h) grad <- mat.or.vec(1,3) dhdv <- mat.or.vec(N,3) hess <- mat.or.vec(3,3) dhdv[1,] <-c(1,0,h) for(t in 2:N) { e[t] <- xt[t] h[t] <- omega+ alpha*e[t-1]^2+beta*h[t-1] dhdv[t,] <- c(1,e[t-1]^2,h[t-1])+ beta*dhdv[t-1] grad <- grad + - (1/(2*h[t]) - e[t]^2/(2*h[t]^2))*dhdv[t,] hess <- hess + 1/(2*h[t]^2)* dhdv[t,] %*% t(dhdv[t,]) } dir <- solve(hess[vlist,vlist])%*% grad[vlist] return(dir) }`

Now I will discuss about the heuristic part of the modification in the algorithm. In this heuristic, certain number of dimensions are randomly selected and perturbed. Remaining dimensions are left unchanged. This way we can overcome the scaling issue. This given me optimal solution but it is not optimised in the execution perspective. It take slightly more time.( I compared few simple models with my heuristic and existing garch estimation.)

`## Estimation using heuristic modification of newton raphson method estim <- function(par3=NULL,xt,vlist=1:3,max.iter=200,tol=0.1^8) { np <- length(vlist) ## number of parameters iter <- 0 madechange <- TRUE cnt <- 0 if(is.null(par3)) { init <- runif(3,0,.1) init[1] <- var(xt) } else init <- par3 ## Converting the parameters for the full model pm <- function(spar) { par <- PARAM par[vlist] <- spar return(par) } ## Creating function for perturbed likelihood fl2 <- function(d){ l2 <- -loglik(pm(fest+d*incr),xt); ifelse(is.nan(l2),-10^10,l2) } ## feasibility check for the parameter: Finiteness and positivity fcheck <- function(par){ (!is.nan(loglik(pm(par),xt))) &(sum(sign(pm(par)[c(2,3)]<0))==0) } ## this function check the parameter is improved after modification or not is.improve <- function(par){return(loglik(pm(par),xt) >= loglik(pm(fest),xt)) } ## Heuristic selection of improved feasible multiplier optd <- function(d){ p1 <- fest+d*incr p2 <- fest-d*incr c1 <- fcheck(p1) & is.improve(p1) c2 <- fcheck(p2) & is.improve(p2) if(c1>0) d else if(c1==0 &c2>0) -d else { if(abs(d) <10^-6 ) optd(0) else optd(d/2) } } fest <- init[vlist] ## first estimate if(!fcheck(fest)) stop($`

This approach looks very silly for regression type problems. But in garch like process tweaking like this is very much require. I am still looking for the better way. Anyway the blogging helped me to relook at my approach again. Please let me know if you have any suggestions.

The post A heuristic enhancement of optimisation algorithm appeared first on Fiddling with data and code.