**Odd Hypothesis**, and kindly contributed to R-bloggers)

I recently discovered a powerful use for R `expression()`

’s

Say you are trying to fit some experimental data to the following nonlinear equation:

with the independent variable `x`

using `nlminb()`

as the minimization optimizer.

This sort of work is significantly improved (i.e. faster with better convergence) if an analytical gradient vector and a Hessian matrix for the objective function are provided. This means a lot of partial differentiation of the model equation with respect to each parameter.

To get these partial derivatives one could:

- Review some calculus and derive them by hand
- Feed the equation into an online engine like Wolfram Alpha and copy/paste the results

The discovery I made is that there is a purely R solution via the combination of `expression()`

’s and the functions `D()`

and `all.vars()`

.

The `all.vars()`

function extracts all variable and parameter names from an expression as a character vector. For example:

`> all.vars(expression(b1 + (b0 - b1)*exp(-k*x) + b2*x))`

[1] "b1" "b0" "k" "x" "b2"

The `D()`

function takes two arguments: an expression to differentiate and a character specifying the variable term to differentiate by:

`> D(expression(b1 + (b0 - b1)*exp(-k*x) + b2*x), 'x')`

b2 - (b0 - b1) * (exp(-k * x) * k)

The following code produces a list of the partial derivatives of the above equation with respect to each variable/parameter.

`# the model equation`

expr = expression(

(K*y0*exp(u*(x-tl)))/(K + y0*(exp(u*(x-tl))-1)) +

b1 + (b0 - b1)*exp(-k*x) + b2*x

)

sapply(all.vars(expr), function(v){

D(expr, v)

})

# returns:

# $K

# y0 * exp(u * (x - tl))/(K + y0 * (exp(u * (x - tl)) - 1)) - (K *

# y0 * exp(u * (x - tl)))/(K + y0 * (exp(u * (x - tl)) - 1))^2

#

# $y0

# K * exp(u * (x - tl))/(K + y0 * (exp(u * (x - tl)) - 1)) - (K *

# y0 * exp(u * (x - tl))) * (exp(u * (x - tl)) - 1)/(K + y0 *

# (exp(u * (x - tl)) - 1))^2

#

# $u

# K * y0 * (exp(u * (x - tl)) * (x - tl))/(K + y0 * (exp(u * (x -

# tl)) - 1)) - (K * y0 * exp(u * (x - tl))) * (y0 * (exp(u *

# (x - tl)) * (x - tl)))/(K + y0 * (exp(u * (x - tl)) - 1))^2

# ...

Each element of the list returned by the `sapply()`

statement above is itself an expression. Evaluation of each will give rows of the Jacobian matrix, which we’ll subsequently need to compute the gradient:

`p0 = c(y0=0.01, u=0.3, tl=5, K=2, b0=0.01, b1=1, b2=0.001, k=0.1)`

x = seq(0,10)

# notice that t() is applied to put parameters on rows

J = t(sapply(all.vars(expr), function(v, env){

eval(D(expr, v), env)

}, env=c(as.list(p0), list(x=x))))

J

# returns:

# [,1] [,2] [,3] [,4] ...

# K -4.367441e-06 -5.298871e-06 -6.067724e-06 -6.218461e-06 ...

# y0 2.248737e-01 3.033101e-01 4.089931e-01 5.512962e-01 ...

# u -1.118747e-02 -1.207174e-02 -1.220845e-02 -1.097079e-02 ...

# x 1.006712e-01 9.148428e-02 8.327519e-02 7.598662e-02 ...

# tl -6.712481e-04 -9.053805e-04 -1.220845e-03 -1.645619e-03 ...

# b1 0.000000e+00 9.516258e-02 1.812692e-01 2.591818e-01 ...

# b0 1.000000e+00 9.048374e-01 8.187308e-01 7.408182e-01 ...

# k 0.000000e+00 8.957890e-01 1.621087e+00 2.200230e+00 ...

# b2 0.000000e+00 1.000000e+00 2.000000e+00 3.000000e+00 ...

Since `x`

is the independent variable, the row corresponding to it can be safely removed from the Jacobian:

`J = J[names(p0),,drop=F]`

J

# returns:

# [,1] [,2] [,3] [,4] ...

# y0 2.248737e-01 3.033101e-01 4.089931e-01 5.512962e-01 ...

# u -1.118747e-02 -1.207174e-02 -1.220845e-02 -1.097079e-02 ...

# tl -6.712481e-04 -9.053805e-04 -1.220845e-03 -1.645619e-03 ...

# K -4.367441e-06 -5.298871e-06 -6.067724e-06 -6.218461e-06 ...

# b0 1.000000e+00 9.048374e-01 8.187308e-01 7.408182e-01 ...

# b1 0.000000e+00 9.516258e-02 1.812692e-01 2.591818e-01 ...

# b2 0.000000e+00 1.000000e+00 2.000000e+00 3.000000e+00 ...

# k 0.000000e+00 8.957890e-01 1.621087e+00 2.200230e+00 ...

The gradient vector is simply the inner product of the Jacobian and a vector of residuals:

`gr = -J %*% r`

For the Hessian, the full form in Gibbs-Einstein notation is:

The first term is simply

Thankfully, in optimization cases where the initial guess is near the optimum, the behavior of the system should be “linear enough” that one can ignore the second term:

The equivalent R code would be:

`H = J %*% t(J)`

(because linear algebra in R is a little strange, the transpose is applied to the second Jacobian)

Putting it all together:

`# the model equation`

expr = expression(

(K*y0*exp(u*(x-tl)))/(K + y0*(exp(u*(x-tl))-1)) +

b1 + (b0 - b1)*exp(-k*x) + b2*x

)

p0 = c(y0=0.01, u=0.3, tl=5, K=2, b0=0.01, b1=1, b2=0.001, k=0.1)

x = seq(0,48,by=0.25)

# let's say these are the residuals

r = runif(length(x))

# magic syntax that converts an equation expression into a jacobian matrix

J = t(sapply(all.vars(expr), function(v, env){

eval(D(expr, v), env)

}, env = c(as.list(p0), list(x=x))))

# and then a gradient vector

gr = -J %*% r

# and then an approximate Hessian matrix

H = J %*% t(J)

Extending this further, one can now write a generic model object like so:

`ModelObject = setRefClass('ModelObject', `

fields = list(

name = 'character',

expr = 'expression'

),

methods = list(

value = function(p, data){

eval(.self$expr, c(as.list(p), as.list(data)))

},

jacobian = function(p, data){

J = t(sapply(all.vars(.self$expr), function(v, p, data){

eval(D(.self$expr, v), c(as.list(p), as.list(data)))

}, p=p, data=data))

return(J[names(p),,drop=F])

},

gradient = function(p, data){

r = data$y - value(p, data)

return(-jacobian(p, data) %*% r)

},

hessian = function(p, data){

J = jacobian(p, data)

return(J %*% t(J))

}

)

)

which is instantiated with simply an `expression`

and can be used to provide gradient and Hessian functions to `nlminb()`

:

`> xy = list(x=seq(0,10,by=0.25), y=dnorm(seq(0,10,by=0.25),10,2)) # test data`

> p0 = c(y0=0.01, u =0.2, l=5, A=log(1.5/0.01))

> mo = ModelObject(

name = 'gompertz',

expr = expression( y0*exp(A*exp(-exp((u*exp(1)/A)*(l-x)+1))) )

)

> fit = nlminb(p0, function(p, data){

r = data$y - mo$value(p,data)

return(r %*% r)

}, gradient = mo$gradient, hessian = mo$hessian, data=xy)

> fit$par

y0 u l A

0.001125766 1.345796890 3.646340494 5.408138500

> fit$message

[1] "relative convergence (4)"

> plot(xy, main='Fit Results'); lines(xy$x, mo$value(fit$par, xy))

Painless!

Written with StackEdit.

**leave a comment**for the author, please follow the link and comment on their blog:

**Odd Hypothesis**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...