**Odd Hypothesis**, and kindly contributed to R-bloggers)

In a previous post I demonstrated how to use R’s simple built-in symbolic engine to generate Jacobian and (pseudo)-Hessian matrices that make non-linear optimization perform much more efficiently. Another related application is Gaussian error propagation.

Say you have data from a set of measurements in variables `x`

and `y`

where you know the corresponding measurement errors (`dx`

and `dy`

, typically the standard deviation or error from a set of replicates or a calibration curve). Next you want to create a derived value defined by an arbitrary function `z = f(x,y)`

. What would the corresponding error in value of `z`

, i.e. `dz = df`

, be?

If the function `f(x,y)`

is a simple sum or product, their are simple equations for determining `df`

. However, if `f(x,y)`

is something more complex, like:

you’ll need to use a bit of calculus, specifically the chain rule:

Applying the above equation allows for the derivation of Gaussian error propagation for any arbitrary function. So how does one do this in R? Again, the `D()`

function and R `expression()`

objects come to our rescue.

Say the definition of z (ergo `f(x,y)`

) is defined in an R `formula`

:

`> f = z ~ (x-y)/(x+y)^2`

If you probe the structure of a formula object you get:

`> str(f)`

Class 'formula' length 3 z ~ (x - y)/(x + y)^2

..- attr(*, ".Environment")=<environment: R_GlobalEnv>

What’s key is the “`length 3`

” bit:

`> f[[1]]; f[[2]]; f[[3]]`

`~`

z

(x - y)/(x + y)^2

The code above shows us that a `formula`

object can be subsetted into its constituent parts:

- the formula operator:
`~`

- the left-hand side (LHS) of the formula:
`z`

- the right-hand side (RHS) of the formula:
`(x - y)/(x + y)^2`

The `class()`

of the RHS is a `call`

, which is close enough to an R `expression`

that both `all.vars()`

and `D()`

work as expected to generate the mathematical expressions for the partial derivatives with respect to each variable:

`> all.vars(f[[3]])`

[1] "x" "y"

> lapply(all.vars(f[[3]]), function(v) D(f[[3]], v))

[[1]]

1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2

[[2]]

-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)

These expressions need to be modified a bit – i.e. in this case they need to be multiplied by `dx`

and `dy`

, respectively and then squared. What’s returned from `D()`

is a `call`

object, so the elements above need to be converted to `character`

to manipulate them accordingly. This is done with `deparse()`

.

`> lapply(all.vars(f[[3]]), function(v) deparse(D(f[[3]], v)))`

[[1]]

[1] "1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2"

[[2]]

[1] "-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)"

The final error propagation expression is created with a bit of string manipulation:

`> sprintf('sqrt(%s)', `

paste(

sapply(all.vars(f[[3]]), function(v) {

sprintf('(d%s*(%s))^2', v, deparse(D(f[[3]], v)))

}),

collapse='+'

)

)

[1] "sqrt((dx*(1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2))^2+(dy*(-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)))^2)"

Now that we’ve got the basics down, let’s test this out with some data …

`> set.seed(0)`

> data = data.frame(

x = runif(5),

y = runif(5),

dx = runif(5)/10,

dy = runif(5)/10

)

> data

x y dx dy

1 0.8966972 0.2016819 0.006178627 0.07698414

2 0.2655087 0.8983897 0.020597457 0.04976992

3 0.3721239 0.9446753 0.017655675 0.07176185

4 0.5728534 0.6607978 0.068702285 0.09919061

5 0.9082078 0.6291140 0.038410372 0.03800352

and with a little help from `dplyr`

:

`> library(dplyr)`

> data %>%

+ mutate_(.dots=list(

+ # compute derived value

+ z = deparse(f[[3]]),

+

+ # generates a mathematical expression to compute dz

+ # as a character string

+ dz = sapply(all.vars(f[[3]]), function(v) {

+ dfdp = deparse(D(f[[3]], v))

+ sprintf('(d%s*(%s))^2', v, dfdp)

+ }) %>%

+ paste(collapse='+') %>%

+ sprintf('sqrt(%s)', .)

+ ))

x y dx dy z dz

1 0.8966972 0.2016819 0.006178627 0.07698414 0.57608929 0.14457245

2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297

3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697

4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809

5 0.9082078 0.6291140 0.038410372 0.03800352 0.11809201 0.02424023

Taking this a step further, this method can be wrapped in a chainable function that determines the name of new variables from the LHS of a formula argument:

`mutate_with_error = function(.data, f) {`

exprs = list(

# expression to compute new variable values

deparse(f[[3]]),

# expression to compute new variable errors

sapply(all.vars(f[[3]]), function(v) {

dfdp = deparse(D(f[[3]], v))

sprintf('(d%s*(%s))^2', v, dfdp)

}) %>%

paste(collapse='+') %>%

sprintf('sqrt(%s)', .)

)

names(exprs) = c(

deparse(f[[2]]),

sprintf('d%s', deparse(f[[2]]))

)

.data %>%

# the standard evaluation alternative of mutate()

mutate_(.dots=exprs)

}

Thus, adding new derived variables and propagating errors accordingly becomes relatively easy:

`> set.seed(0)`

> data = data.frame(x=runif(5), y=runif(5), dx=runif(5)/10, dy=runif(5)/10)

> data

x y dx dy

1 0.8966972 0.2016819 0.006178627 0.07698414

2 0.2655087 0.8983897 0.020597457 0.04976992

3 0.3721239 0.9446753 0.017655675 0.07176185

4 0.5728534 0.6607978 0.068702285 0.09919061

5 0.9082078 0.6291140 0.038410372 0.03800352

> data %>% mutate_with_error(z ~ (x-y)/(x+y)^2)

x y dx dy z dz

1 0.8966972 0.2016819 0.006178627 0.07698414 0.57608929 0.14457245

2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297

3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697

4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809

5 0.9082078 0.6291140 0.038410372 0.03800352 0.11809201 0.02424023

Written with StackEdit.

**leave a comment**for the author, please follow the link and comment on their blog:

**Odd Hypothesis**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...