Bayesian model II regression

[This article was first published on Ecology in silico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Regression is a mainstay of ecological and evolutionary data analysis. For example, a disease ecologist may use body size (e.g. a weight from a scale with measurement error) to predict infection. Classical linear regression assumes no error in covariates; they are known exactly. This is rarely the case in ecology, and ignoring error in covariates can bias regression coefficient estimates. This is where model II (aka errors-in variables and measurement errors) regression models come in handy. Here I’ll demonstrate how to construct such a model in a Bayesian framework, where substantive prior knowledge of covariate error facilitates less-biased parameter estimates.

Here’s a quick illustration of the problem: I’ll generate data from a known simple linear regression model, and fit models that ignore or incorporate error in the covariate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<span class="line"><span class="c1"># simulate covariate data</span>
</span><span class="line">n <span class="o"><-</span> <span class="m">50</span>
</span><span class="line">sdx <span class="o"><-</span> <span class="m">6</span>
</span><span class="line">sdobs <span class="o"><-</span> <span class="m">5</span>
</span><span class="line">taux <span class="o"><-</span> <span class="m">1</span> <span class="o">/</span> <span class="p">(</span>sdobs <span class="o">*</span> sdobs<span class="p">)</span>
</span><span class="line">truex <span class="o"><-</span> rnorm<span class="p">(</span>n<span class="p">,</span> <span class="m">0</span><span class="p">,</span> sdx<span class="p">)</span>
</span><span class="line">errorx <span class="o"><-</span> rnorm<span class="p">(</span>n<span class="p">,</span> <span class="m">0</span><span class="p">,</span> sdobs<span class="p">)</span>
</span><span class="line">obsx <span class="o"><-</span> truex <span class="o">+</span> errorx
</span><span class="line">
</span><span class="line"><span class="c1"># simulate response data</span>
</span><span class="line">alpha <span class="o"><-</span> <span class="m">0</span>
</span><span class="line">beta <span class="o"><-</span> <span class="m">10</span>
</span><span class="line">sdy <span class="o"><-</span> <span class="m">20</span>
</span><span class="line">errory <span class="o"><-</span> rnorm<span class="p">(</span>n<span class="p">,</span> <span class="m">0</span><span class="p">,</span> sdy<span class="p">)</span>
</span><span class="line">obsy <span class="o"><-</span> alpha <span class="o">+</span> beta<span class="o">*</span>truex <span class="o">+</span> errory
</span><span class="line">parms <span class="o"><-</span> data.frame<span class="p">(</span>alpha<span class="p">,</span> beta<span class="p">)</span>
</span>

Ignoring error in the covariate:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<span class="line"><span class="c1"># bundle data</span>
</span><span class="line">jags_d <span class="o"><-</span> list<span class="p">(</span>x <span class="o">=</span> obsx<span class="p">,</span> y <span class="o">=</span> obsy<span class="p">,</span> n <span class="o">=</span> length<span class="p">(</span>obsx<span class="p">))</span>
</span><span class="line">
</span><span class="line"><span class="c1"># write model</span>
</span><span class="line">cat<span class="p">(</span><span class="s">"</span>
</span><span class="line"><span class="s">    model{</span>
</span><span class="line"><span class="s">## Priors</span>
</span><span class="line"><span class="s">alpha ~ dnorm(0, .001)</span>
</span><span class="line"><span class="s">beta ~ dnorm(0, .001)</span>
</span><span class="line"><span class="s">sdy ~ dunif(0, 100)</span>
</span><span class="line"><span class="s">tauy <- 1 / (sdy * sdy)</span>
</span><span class="line">
</span><span class="line"><span class="s">## Likelihood</span>
</span><span class="line"><span class="s">  for (i in 1:n){</span>
</span><span class="line"><span class="s">    mu[i] <- alpha + beta * x[i]</span>
</span><span class="line"><span class="s">    y[i] ~ dnorm(mu[i], tauy)</span>
</span><span class="line"><span class="s">  }</span>
</span><span class="line"><span class="s">}</span>
</span><span class="line"><span class="s">"</span><span class="p">,</span>
</span><span class="line">    fill<span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span> file<span class="o">=</span><span class="s">"yerror.txt"</span><span class="p">)</span>
</span><span class="line">
</span><span class="line">require<span class="p">(</span>rjags<span class="p">)</span>
</span><span class="line"><span class="c1"># initiate model</span>
</span><span class="line">mod1 <span class="o"><-</span> jags.model<span class="p">(</span><span class="s">"yerror.txt"</span><span class="p">,</span> data<span class="o">=</span>jags_d<span class="p">,</span>
</span><span class="line">                   n.chains<span class="o">=</span><span class="m">3</span><span class="p">,</span> n.adapt<span class="o">=</span><span class="m">1000</span><span class="p">)</span>
</span><span class="line">
</span><span class="line"><span class="c1"># simulate posterior</span>
</span><span class="line">out <span class="o"><-</span> coda.samples<span class="p">(</span>mod1<span class="p">,</span> n.iter<span class="o">=</span><span class="m">1000</span><span class="p">,</span> thin<span class="o">=</span><span class="m">1</span><span class="p">,</span>
</span><span class="line">                    variable.names<span class="o">=</span>c<span class="p">(</span><span class="s">"alpha"</span><span class="p">,</span> <span class="s">"beta"</span><span class="p">,</span> <span class="s">"sdy"</span><span class="p">))</span>
</span><span class="line">
</span><span class="line"><span class="c1"># store parameter estimates</span>
</span><span class="line">require<span class="p">(</span>ggmcmc<span class="p">)</span>
</span><span class="line">ggd <span class="o"><-</span> ggs<span class="p">(</span>out<span class="p">)</span>
</span><span class="line">a <span class="o"><-</span> ggd<span class="o">$</span>value<span class="p">[</span>which<span class="p">(</span>ggd<span class="o">$</span>Parameter <span class="o">==</span> <span class="s">"alpha"</span><span class="p">)]</span>
</span><span class="line">b <span class="o"><-</span> ggd<span class="o">$</span>value<span class="p">[</span>which<span class="p">(</span>ggd<span class="o">$</span>Parameter <span class="o">==</span> <span class="s">"beta"</span><span class="p">)]</span>
</span><span class="line">d <span class="o"><-</span> data.frame<span class="p">(</span>a<span class="p">,</span> b<span class="p">)</span>
</span>

Incorporating error in the covariate: I’m assuming that we have substantive knowledge about covariate measurement represented in the prior for the precision in X. Further, the prior for the true X values reflects knowledge of the distribution of our X value in the population from which the sample was taken.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<span class="line"><span class="c1"># specify model</span>
</span><span class="line">cat<span class="p">(</span><span class="s">"</span>
</span><span class="line"><span class="s">    model {</span>
</span><span class="line"><span class="s">## Priors</span>
</span><span class="line"><span class="s">alpha ~ dnorm(0, .001)</span>
</span><span class="line"><span class="s">beta ~ dnorm(0, .001)</span>
</span><span class="line"><span class="s">sdy ~ dunif(0, 100)</span>
</span><span class="line"><span class="s">tauy <- 1 / (sdy * sdy)</span>
</span><span class="line"><span class="s">taux ~ dunif(.03, .05)</span>
</span><span class="line">
</span><span class="line"><span class="s">## Likelihood</span>
</span><span class="line"><span class="s">  for (i in 1:n){</span>
</span><span class="line"><span class="s">    truex[i] ~ dnorm(0, .04)</span>
</span><span class="line"><span class="s">    x[i] ~ dnorm(truex[i], taux)</span>
</span><span class="line"><span class="s">    y[i] ~ dnorm(mu[i], tauy)</span>
</span><span class="line"><span class="s">    mu[i] <- alpha + beta * truex[i]</span>
</span><span class="line"><span class="s">  }</span>
</span><span class="line"><span class="s">}</span>
</span><span class="line"><span class="s">    "</span><span class="p">,</span> fill<span class="o">=</span><span class="k-Variable">T</span><span class="p">,</span> file<span class="o">=</span><span class="s">"xyerror.txt"</span><span class="p">)</span>
</span><span class="line">
</span><span class="line"><span class="c1"># bundle data</span>
</span><span class="line">jags_d <span class="o"><-</span> list<span class="p">(</span>x <span class="o">=</span> obsx<span class="p">,</span> y <span class="o">=</span> obsy<span class="p">,</span> n <span class="o">=</span> length<span class="p">(</span>obsx<span class="p">))</span>
</span><span class="line">
</span><span class="line"><span class="c1"># initiate model</span>
</span><span class="line">mod2 <span class="o"><-</span> jags.model<span class="p">(</span><span class="s">"xyerror.txt"</span><span class="p">,</span> data<span class="o">=</span>jags_d<span class="p">,</span>
</span><span class="line">                   n.chains<span class="o">=</span><span class="m">3</span><span class="p">,</span> n.adapt<span class="o">=</span><span class="m">1000</span><span class="p">)</span>
</span><span class="line">
</span><span class="line"><span class="c1"># simulate posterior</span>
</span><span class="line">out <span class="o"><-</span> coda.samples<span class="p">(</span>mod2<span class="p">,</span> n.iter<span class="o">=</span><span class="m">30000</span><span class="p">,</span> thin<span class="o">=</span><span class="m">30</span><span class="p">,</span>
</span><span class="line">                    variable.names<span class="o">=</span>c<span class="p">(</span><span class="s">"alpha"</span><span class="p">,</span> <span class="s">"beta"</span><span class="p">,</span> <span class="s">"tauy"</span><span class="p">,</span> <span class="s">"taux"</span><span class="p">))</span>
</span><span class="line"><span class="c1"># store parameter estimates</span>
</span><span class="line">ggd <span class="o"><-</span> ggs<span class="p">(</span>out<span class="p">)</span>
</span><span class="line">a2 <span class="o"><-</span> ggd<span class="o">$</span>value<span class="p">[</span>which<span class="p">(</span>ggd<span class="o">$</span>Parameter <span class="o">==</span> <span class="s">"alpha"</span><span class="p">)]</span>
</span><span class="line">b2 <span class="o"><-</span> ggd<span class="o">$</span>value<span class="p">[</span>which<span class="p">(</span>ggd<span class="o">$</span>Parameter <span class="o">==</span> <span class="s">"beta"</span><span class="p">)]</span>
</span><span class="line">d2 <span class="o"><-</span> data.frame<span class="p">(</span>a2<span class="p">,</span> b2<span class="p">)</span>
</span>

Now let’s see how the two models perform.

1
2
3
4
5
6
7
8
9
<span class="line">ggplot<span class="p">(</span>d<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>obsx<span class="p">,</span> obsy<span class="p">))</span> <span class="o">+</span>
</span><span class="line">  geom_abline<span class="p">(</span>aes<span class="p">(</span>intercept<span class="o">=</span>a<span class="p">,</span> slope<span class="o">=</span>b<span class="p">),</span> data<span class="o">=</span>d<span class="p">,</span> color<span class="o">=</span><span class="s">"red"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.05</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  geom_abline<span class="p">(</span>aes<span class="p">(</span>intercept<span class="o">=</span>a2<span class="p">,</span> slope<span class="o">=</span>b2<span class="p">),</span> data<span class="o">=</span>d2<span class="p">,</span> color<span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.05</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  geom_abline<span class="p">(</span>aes<span class="p">(</span>intercept<span class="o">=</span>alpha<span class="p">,</span> slope<span class="o">=</span>beta<span class="p">),</span>
</span><span class="line">              data<span class="o">=</span>parms<span class="p">,</span> color<span class="o">=</span><span class="s">"green"</span><span class="p">,</span> size<span class="o">=</span><span class="m">1.5</span><span class="p">,</span> linetype<span class="o">=</span><span class="s">"dashed"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  theme_bw<span class="p">()</span> <span class="o">+</span>
</span><span class="line">  geom_point<span class="p">(</span>shape<span class="o">=</span><span class="m">1</span><span class="p">,</span> size<span class="o">=</span><span class="m">3</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  xlab<span class="p">(</span><span class="s">"X values"</span><span class="p">)</span> <span class="o">+</span> ylab<span class="p">(</span><span class="s">"Observed Y values"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">  ggtitle<span class="p">(</span><span class="s">"Model results with and without modeling error in X"</span><span class="p">)</span>
</span>

The dashed green line shows the model that generated the data, i.e. the “true” line. The red lines show the posterior for the naive model ignoring error in X, while the less-biased blue lines show the posterior for the model incorporating error in X.

To leave a comment for the author, please follow the link and comment on their blog: Ecology in silico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)