Easy error propagation in R

January 22, 2015
By

(This article was first published on Odd Hypothesis, and kindly contributed to R-bloggers)

In a previous post I demonstrated how to use R’s simple built-in symbolic engine to generate Jacobian and (pseudo)-Hessian matrices that make non-linear optimization perform much more efficiently. Another related application is Gaussian error propagation.

Say you have data from a set of measurements in variables x and y where you know the corresponding measurement errors (dx and dy, typically the standard deviation or error from a set of replicates or a calibration curve). Next you want to create a derived value defined by an arbitrary function z = f(x,y). What would the corresponding error in value of z, i.e. dz = df, be?

If the function f(x,y) is a simple sum or product, their are simple equations for determining df. However, if f(x,y) is something more complex, like:

z=f(x,y)=xy(x+y)2

you’ll need to use a bit of calculus, specifically the chain rule:

df=(dxfx)2+(dyfy)2+...

Applying the above equation allows for the derivation of Gaussian error propagation for any arbitrary function. So how does one do this in R? Again, the D() function and R expression() objects come to our rescue.

Say the definition of z (ergo f(x,y)) is defined in an R formula:

> f = z ~ (x-y)/(x+y)^2

If you probe the structure of a formula object you get:

> str(f)
Class 'formula' length 3 z ~ (x - y)/(x + y)^2
..- attr(*, ".Environment")=<environment: R_GlobalEnv>

What’s key is the “length 3” bit:

> f[[1]]; f[[2]]; f[[3]]
`~`
z
(x - y)/(x + y)^2

The code above shows us that a formula object can be subsetted into its constituent parts:

  1. the formula operator: ~
  2. the left-hand side (LHS) of the formula: z
  3. the right-hand side (RHS) of the formula: (x - y)/(x + y)^2

The class() of the RHS is a call, which is close enough to an R expression that both all.vars() and D() work as expected to generate the mathematical expressions for the partial derivatives with respect to each variable:

> all.vars(f[[3]])
[1] "x" "y"

> lapply(all.vars(f[[3]]), function(v) D(f[[3]], v))
[[1]]
1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2

[[2]]
-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)

These expressions need to be modified a bit – i.e. in this case they need to be multiplied by dx and dy, respectively and then squared. What’s returned from D() is a call object, so the elements above need to be converted to character to manipulate them accordingly. This is done with deparse().

> lapply(all.vars(f[[3]]), function(v) deparse(D(f[[3]], v)))
[[1]]
[1] "1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2"

[[2]]
[1] "-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)"

The final error propagation expression is created with a bit of string manipulation:

> sprintf('sqrt(%s)', 
paste
(
sapply
(all.vars(f[[3]]), function(v) {
sprintf
('(d%s*(%s))^2', v, deparse(D(f[[3]], v)))
}),
collapse
='+'
)
)
[1] "sqrt((dx*(1/(x + y)^2 - (x - y) * (2 * (x + y))/((x + y)^2)^2))^2+(dy*(-(1/(x + y)^2 + (x - y) * (2 * (x + y))/((x + y)^2)^2)))^2)"

Now that we’ve got the basics down, let’s test this out with some data …

> set.seed(0)
> data = data.frame(
x
= runif(5),
y
= runif(5),
dx
= runif(5)/10,
dy
= runif(5)/10
)
> data
x y dx dy
1 0.8966972 0.2016819 0.006178627 0.07698414
2 0.2655087 0.8983897 0.020597457 0.04976992
3 0.3721239 0.9446753 0.017655675 0.07176185
4 0.5728534 0.6607978 0.068702285 0.09919061
5 0.9082078 0.6291140 0.038410372 0.03800352

and with a little help from dplyr:

> library(dplyr)
> data %>%
+ mutate_(.dots=list(
+ # compute derived value
+ z = deparse(f[[3]]),
+
+ # generates a mathematical expression to compute dz
+ # as a character string
+ dz = sapply(all.vars(f[[3]]), function(v) {
+ dfdp = deparse(D(f[[3]], v))
+ sprintf('(d%s*(%s))^2', v, dfdp)
+ }) %>%
+ paste(collapse='+') %>%
+ sprintf('sqrt(%s)', .)
+ ))
x y dx dy z dz
1 0.8966972 0.2016819 0.006178627 0.07698414 0.57608929 0.14457245
2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297
3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697
4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809
5 0.9082078 0.6291140 0.038410372 0.03800352 0.11809201 0.02424023

Taking this a step further, this method can be wrapped in a chainable function that determines the name of new variables from the LHS of a formula argument:

mutate_with_error = function(.data, f) {
exprs
= list(
# expression to compute new variable values
deparse
(f[[3]]),

# expression to compute new variable errors
sapply
(all.vars(f[[3]]), function(v) {
dfdp
= deparse(D(f[[3]], v))
sprintf
('(d%s*(%s))^2', v, dfdp)
}) %>%
paste
(collapse='+') %>%
sprintf
('sqrt(%s)', .)
)
names
(exprs) = c(
deparse
(f[[2]]),
sprintf
('d%s', deparse(f[[2]]))
)

.data %>%
# the standard evaluation alternative of mutate()
mutate_
(.dots=exprs)
}

Thus, adding new derived variables and propagating errors accordingly becomes relatively easy:

> set.seed(0)
> data = data.frame(x=runif(5), y=runif(5), dx=runif(5)/10, dy=runif(5)/10)
> data
x y dx dy
1 0.8966972 0.2016819 0.006178627 0.07698414
2 0.2655087 0.8983897 0.020597457 0.04976992
3 0.3721239 0.9446753 0.017655675 0.07176185
4 0.5728534 0.6607978 0.068702285 0.09919061
5 0.9082078 0.6291140 0.038410372 0.03800352

> data %>% mutate_with_error(z ~ (x-y)/(x+y)^2)
x y dx dy z dz
1 0.8966972 0.2016819 0.006178627 0.07698414 0.57608929 0.14457245
2 0.2655087 0.8983897 0.020597457 0.04976992 -0.46718831 0.03190297
3 0.3721239 0.9446753 0.017655675 0.07176185 -0.33019871 0.01978697
4 0.5728534 0.6607978 0.068702285 0.09919061 -0.05778613 0.07604809
5 0.9082078 0.6291140 0.038410372 0.03800352 0.11809201 0.02424023

Written with StackEdit.

To leave a comment for the author, please follow the link and comment on their blog: Odd Hypothesis.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)