Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a previous post I demonstrated how to use R’s simple built-in symbolic engine to generate Jacobian and (pseudo)-Hessian matrices that make non-linear optimization perform much more efficiently. Another related application is Gaussian error propagation.

Say you have data from a set of measurements in variables x and y where you know the corresponding measurement errors (dx and dy, typically the standard deviation or error from a set of replicates or a calibration curve). Next you want to create a derived value defined by an arbitrary function z = f(x,y). What would the corresponding error in value of z, i.e. dz = df, be?

If the function f(x,y) is a simple sum or product, their are simple equations for determining df. However, if f(x,y) is something more complex, like:

z=f(x,y)=xy(x+y)2

you’ll need to use a bit of calculus, specifically the chain rule:

df=(dxfx)2+(dyfy)2+...

Applying the above equation allows for the derivation of Gaussian error propagation for any arbitrary function. So how does one do this in R? Again, the D() function and R expression() objects come to our rescue.

Say the definition of z (ergo f(x,y)) is defined in an R formula:

> f = z ~ (x-y)/(x+y)^2

If you probe the structure of a formula object you get:

> str(f)Class 'formula' length 3 z ~ (x - y)/(x + y)^2  ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

What’s key is the “length 3” bit:

> f[]; f[]; f[]~z(x - y)/(x + y)^2

The code above shows us that a formula object can be subsetted into its constituent parts:

1. the formula operator: ~
2. the left-hand side (LHS) of the formula: z
3. the right-hand side (RHS) of the formula: (x - y)/(x + y)^2

The class() of the RHS is a call, which is close enough to an R expression that both all.vars() and D() work as expected to generate the mathematical expressions for the partial derivatives with respect to each variable:

> all.vars(f[]) "x" "y"> lapply(all.vars(f[]), function(v) D(f[], v))[]1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2[]-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)

These expressions need to be modified a bit – i.e. in this case they need to be multiplied by dx and dy, respectively and then squared. What’s returned from D() is a call object, so the elements above need to be converted to character to manipulate them accordingly. This is done with deparse().

> lapply(all.vars(f[]), function(v) deparse(D(f[], v)))[] "1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2"[] "-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)"

The final error propagation expression is created with a bit of string manipulation:

> sprintf('sqrt(%s)',     paste(        sapply(all.vars(f[]), function(v) {            sprintf('(d%s*(%s))^2', v, deparse(D(f[], v)))        }),         collapse='+'    )  ) "sqrt((dx*(1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2))^2+(dy*(-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)))^2)"

Now that we’ve got the basics down, let’s test this out with some data …

> set.seed(0)> data = data.frame(    x  = runif(5),     y  = runif(5),     dx = runif(5)/10,     dy = runif(5)/10  )> data          x         y          dx         dy1 0.8966972 0.2016819 0.006178627 0.076984142 0.2655087 0.8983897 0.020597457 0.049769923 0.3721239 0.9446753 0.017655675 0.071761854 0.5728534 0.6607978 0.068702285 0.099190615 0.9082078 0.6291140 0.038410372 0.03800352

and with a little help from dplyr:

> library(dplyr)> data %>%+   mutate_(.dots=list(+     # compute derived value+     z  = deparse(f[]),+     +     # generates a mathematical expression to compute dz+     # as a character string+     dz = sapply(all.vars(f[]), function(v) {+             dfdp = deparse(D(f[], v))+             sprintf('(d%s*(%s))^2', v, dfdp)+           }) %>%+           paste(collapse='+') %>%+           sprintf('sqrt(%s)', .)+       ))          x         y          dx         dy           z         dz1 0.8966972 0.2016819 0.006178627 0.07698414  0.57608929 0.144572452 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.031902973 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.019786974 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.076048095 0.9082078 0.6291140 0.038410372 0.03800352  0.11809201 0.02424023

Taking this a step further, this method can be wrapped in a chainable function that determines the name of new variables from the LHS of a formula argument:

mutate_with_error = function(.data, f) {  exprs = list(      # expression to compute new variable values      deparse(f[]),      # expression to compute new variable errors      sapply(all.vars(f[]), function(v) {        dfdp = deparse(D(f[], v))        sprintf('(d%s*(%s))^2', v, dfdp)      }) %>%        paste(collapse='+') %>%        sprintf('sqrt(%s)', .)  )  names(exprs) = c(    deparse(f[]),    sprintf('d%s', deparse(f[]))  )  .data %>%    # the standard evaluation alternative of mutate()    mutate_(.dots=exprs)}

Thus, adding new derived variables and propagating errors accordingly becomes relatively easy:

> set.seed(0)> data = data.frame(x=runif(5), y=runif(5), dx=runif(5)/10, dy=runif(5)/10)> data          x         y          dx         dy1 0.8966972 0.2016819 0.006178627 0.076984142 0.2655087 0.8983897 0.020597457 0.049769923 0.3721239 0.9446753 0.017655675 0.071761854 0.5728534 0.6607978 0.068702285 0.099190615 0.9082078 0.6291140 0.038410372 0.03800352> data %>% mutate_with_error(z ~ (x-y)/(x+y)^2)          x         y          dx         dy           z         dz1 0.8966972 0.2016819 0.006178627 0.07698414  0.57608929 0.144572452 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.031902973 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.019786974 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.076048095 0.9082078 0.6291140 0.038410372 0.03800352  0.11809201 0.02424023

Written with StackEdit.