Shiny variance inflation factor sandbox

[This article was first published on Ecology in silico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In multiple regression, strong correlation among covariates increases the uncertainty or variance in estimated regression coefficients.
Variance inflation factors (VIFs) are one tool that has been used as an indicator of problematic covariate collinearity.
In teaching students about VIFs, it may be useful to have some interactive supplementary material so that they can manipulate factors affecting the uncertainty in slope terms in real-time.

Here’s a little R shiny app that could be used as a starting point for such a supplement.
Currently it only includes two covariates for simplicity, and gives the user control over the covariate $R^2$ value, the residual variance, and the variance of both covariates.

As usual, the file server.R defines what you want to actually do in R:

server.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
<span class="line"><span class="c1"># interactive variance inflation factor module</span>
</span><span class="line">library<span class="p">(</span>shiny<span class="p">)</span>
</span><span class="line">library<span class="p">(</span>car<span class="p">)</span>
</span><span class="line">library<span class="p">(</span>mvtnorm<span class="p">)</span>
</span><span class="line">library<span class="p">(</span>gridExtra<span class="p">)</span>
</span><span class="line">library<span class="p">(</span>ggplot2<span class="p">)</span>
</span><span class="line">
</span><span class="line">shinyServer<span class="p">(</span><span class="kr">function</span><span class="p">(</span>input<span class="p">,</span> output<span class="p">){</span>
</span><span class="line">  output<span class="o">$</span>plot <span class="o"><-</span> renderPlot<span class="p">({</span>
</span><span class="line">    r2 <span class="o"><-</span> input<span class="o">$</span>r2
</span><span class="line">    var_error <span class="o"><-</span> input<span class="o">$</span>resid_var
</span><span class="line">    var_x1 <span class="o"><-</span> input<span class="o">$</span>var_x1
</span><span class="line">    var_x2 <span class="o"><-</span> input<span class="o">$</span>var_x2
</span><span class="line">    beta <span class="o"><-</span> c<span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
</span><span class="line">
</span><span class="line">    <span class="c1"># users enter R^2, this backcalculates covariance</span>
</span><span class="line">    cov_x1x2 <span class="o"><-</span> sqrt<span class="p">(</span>var_x1 <span class="o">*</span> var_x2 <span class="o">*</span> r2<span class="p">)</span>
</span><span class="line">    sigma <span class="o"><-</span> matrix<span class="p">(</span>c<span class="p">(</span>var_x1<span class="p">,</span> cov_x1x2<span class="p">,</span>
</span><span class="line">                      cov_x1x2<span class="p">,</span> var_x2<span class="p">),</span>
</span><span class="line">                    nrow<span class="o">=</span><span class="m">2</span><span class="p">)</span>
</span><span class="line">
</span><span class="line">    X <span class="o"><-</span> array<span class="p">(</span><span class="m">1</span><span class="p">,</span> dim<span class="o">=</span>c<span class="p">(</span>input<span class="o">$</span>n<span class="p">,</span> <span class="m">3</span><span class="p">))</span>
</span><span class="line">    X<span class="p">[,</span> c<span class="p">(</span><span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">)]</span> <span class="o"><-</span> rmvnorm<span class="p">(</span>n<span class="o">=</span>input<span class="o">$</span>n<span class="p">,</span> sigma<span class="o">=</span>sigma<span class="p">,</span> method<span class="o">=</span><span class="s">"chol"</span><span class="p">)</span>
</span><span class="line">    mu <span class="o"><-</span> X <span class="o">%*%</span> beta
</span><span class="line">    epsilon <span class="o"><-</span> rnorm<span class="p">(</span>input<span class="o">$</span>n<span class="p">,</span> <span class="m">0</span><span class="p">,</span> sqrt<span class="p">(</span>var_error<span class="p">))</span>
</span><span class="line">    y <span class="o"><-</span> mu <span class="o">+</span> epsilon
</span><span class="line">
</span><span class="line">    X1 <span class="o"><-</span> X<span class="p">[,</span> <span class="m">2</span><span class="p">]</span>
</span><span class="line">    X2 <span class="o"><-</span> X<span class="p">[,</span> <span class="m">3</span><span class="p">]</span>
</span><span class="line">
</span><span class="line">    model <span class="o"><-</span> lm<span class="p">(</span>y <span class="o">~</span> <span class="m">1</span> <span class="o">+</span> X1 <span class="o">+</span> X2<span class="p">)</span>
</span><span class="line">
</span><span class="line">    <span class="c1"># thanks to Ben Bolker for the next 3 lines</span>
</span><span class="line">    <span class="c1"># https://groups.google.com/forum/#!topic/ggplot2/4-l3dUT-h2I</span>
</span><span class="line">    l <span class="o"><-</span> list<span class="p">(</span>vif <span class="o">=</span> round<span class="p">(</span>vif<span class="p">(</span>model<span class="p">)[</span><span class="m">1</span><span class="p">],</span> digits<span class="o">=</span><span class="m">2</span><span class="p">))</span>
</span><span class="line">    eq <span class="o"><-</span> substitute<span class="p">(</span>italic<span class="p">(</span>VIF<span class="p">)</span> <span class="o">==</span> vif<span class="p">,</span> l<span class="p">)</span>
</span><span class="line">    eqstr <span class="o"><-</span> as.character<span class="p">(</span>as.expression<span class="p">(</span>eq<span class="p">))</span>
</span><span class="line">
</span><span class="line">    l2 <span class="o"><-</span> list<span class="p">(</span>vif <span class="o">=</span> round<span class="p">(</span>vif<span class="p">(</span>model<span class="p">)[</span><span class="m">2</span><span class="p">],</span> digits<span class="o">=</span><span class="m">2</span><span class="p">))</span>
</span><span class="line">    eq2 <span class="o"><-</span> substitute<span class="p">(</span>italic<span class="p">(</span>VIF<span class="p">)</span> <span class="o">==</span> vif<span class="p">,</span> l2<span class="p">)</span>
</span><span class="line">    eqstr2 <span class="o"><-</span> as.character<span class="p">(</span>as.expression<span class="p">(</span>eq2<span class="p">))</span>
</span><span class="line">
</span><span class="line">    <span class="c1"># plot 1: parameter recovery</span>
</span><span class="line">    df3 <span class="o"><-</span> data.frame<span class="p">(</span>truth <span class="o">=</span> beta<span class="p">,</span>
</span><span class="line">                      lci <span class="o">=</span> confint<span class="p">(</span>model<span class="p">)[,</span> <span class="m">1</span><span class="p">],</span>
</span><span class="line">                      uci <span class="o">=</span> confint<span class="p">(</span>model<span class="p">)[,</span> <span class="m">2</span><span class="p">],</span>
</span><span class="line">                      est <span class="o">=</span> coef<span class="p">(</span>model<span class="p">),</span> y<span class="o">=</span><span class="m">0</span><span class="o">:</span><span class="p">(</span>length<span class="p">(</span>beta<span class="p">)</span> <span class="o">-</span> <span class="m">1</span><span class="p">))</span>
</span><span class="line">
</span><span class="line">    cip <span class="o"><-</span> ggplot<span class="p">(</span>df3<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>truth<span class="p">,</span> y<span class="o">=</span>y<span class="p">))</span> <span class="o">+</span>
</span><span class="line">      geom_point<span class="p">(</span>col<span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> size<span class="o">=</span><span class="m">5</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.5</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">      theme_bw<span class="p">()</span> <span class="o">+</span>
</span><span class="line">      geom_segment<span class="p">(</span>aes<span class="p">(</span>x<span class="o">=</span>lci<span class="p">,</span> y<span class="o">=</span>y<span class="p">,</span> xend<span class="o">=</span>uci<span class="p">,</span> yend<span class="o">=</span>y<span class="p">))</span> <span class="o">+</span>
</span><span class="line">      geom_point<span class="p">(</span>aes<span class="p">(</span>x<span class="o">=</span>est<span class="p">,</span> y<span class="o">=</span>y<span class="p">),</span> size<span class="o">=</span><span class="m">3</span><span class="p">,</span> pch<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">      ggtitle<span class="p">(</span><span class="s">"Coefficient recovery"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">      xlab<span class="p">(</span><span class="s">"Value"</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">      ylab<span class="p">(</span>expression<span class="p">(</span>beta<span class="p">))</span> <span class="o">+</span>
</span><span class="line">      scale_y_continuous<span class="p">(</span>breaks <span class="o">=</span> c<span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">))</span> <span class="o">+</span>
</span><span class="line">      theme<span class="p">(</span>axis.title.y <span class="o">=</span> element_text<span class="p">(</span>angle <span class="o">=</span> <span class="m">0</span><span class="p">,</span> hjust <span class="o">=</span> <span class="m">0</span><span class="p">))</span>  <span class="o">+</span>
</span><span class="line">      annotate<span class="p">(</span>geom<span class="o">=</span><span class="s">"text"</span><span class="p">,</span> x<span class="o">=</span><span class="m">0</span><span class="p">,</span> y<span class="o">=</span><span class="m">2</span><span class="p">,</span> label<span class="o">=</span>eqstr<span class="p">,</span>
</span><span class="line">               parse<span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span> vjust<span class="o">=</span><span class="m">1.5</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">-1</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">      annotate<span class="p">(</span>geom<span class="o">=</span><span class="s">"text"</span><span class="p">,</span> x<span class="o">=</span><span class="m">0</span><span class="p">,</span> y<span class="o">=</span><span class="m">1</span><span class="p">,</span> label<span class="o">=</span>eqstr2<span class="p">,</span>
</span><span class="line">               parse<span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span> vjust<span class="o">=</span><span class="m">1.5</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">-1</span><span class="p">)</span>
</span><span class="line">
</span><span class="line">    <span class="c1"># plot 2: X1 & X2 correlation plot</span>
</span><span class="line">    xcorp <span class="o"><-</span> ggplot<span class="p">(</span>data.frame<span class="p">(</span>X1<span class="p">,</span> X2<span class="p">),</span> aes<span class="p">(</span>x<span class="o">=</span>X1<span class="p">,</span> y<span class="o">=</span>X2<span class="p">))</span> <span class="o">+</span>
</span><span class="line">      geom_point<span class="p">(</span>pch<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span>
</span><span class="line">      theme_bw<span class="p">()</span> <span class="o">+</span>
</span><span class="line">      ggtitle<span class="p">(</span><span class="s">"Covariate scatterplot"</span><span class="p">)</span>
</span><span class="line">
</span><span class="line">    grid.arrange<span class="p">(</span>cip<span class="p">,</span> xcorp<span class="p">,</span>
</span><span class="line">                 ncol<span class="o">=</span><span class="m">2</span><span class="p">)</span>
</span><span class="line">
</span><span class="line">  <span class="p">})</span>
</span><span class="line"><span class="p">})</span>
</span>

The file ui.R defines the user interface:

ui.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<span class="line">library<span class="p">(</span>shiny<span class="p">)</span>
</span><span class="line">
</span><span class="line">shinyUI<span class="p">(</span>pageWithSidebar<span class="p">(</span>
</span><span class="line">  headerPanel<span class="p">(</span><span class="s">"Variance inflation factor sandbox"</span><span class="p">),</span>
</span><span class="line">  sidebarPanel<span class="p">(</span>
</span><span class="line">    sliderInput<span class="p">(</span><span class="s">"r2"</span><span class="p">,</span> <span class="s">"Covariate R-squared:"</span><span class="p">,</span>
</span><span class="line">                min<span class="o">=</span><span class="m">0</span><span class="p">,</span> max<span class="o">=</span><span class="m">.99</span><span class="p">,</span> value<span class="o">=</span><span class="m">0</span><span class="p">),</span>
</span><span class="line">    sliderInput<span class="p">(</span><span class="s">"resid_var"</span><span class="p">,</span> <span class="s">"Residual variance:"</span><span class="p">,</span>
</span><span class="line">                min<span class="o">=</span><span class="m">1</span><span class="p">,</span> max<span class="o">=</span><span class="m">10</span><span class="p">,</span> value<span class="o">=</span><span class="m">1</span><span class="p">),</span>
</span><span class="line">    sliderInput<span class="p">(</span><span class="s">"var_x1"</span><span class="p">,</span> <span class="s">"Variance of X1:"</span><span class="p">,</span>
</span><span class="line">                min<span class="o">=</span><span class="m">1</span><span class="p">,</span> max<span class="o">=</span><span class="m">10</span><span class="p">,</span> value<span class="o">=</span><span class="m">1</span><span class="p">),</span>
</span><span class="line">    sliderInput<span class="p">(</span><span class="s">"var_x2"</span><span class="p">,</span> <span class="s">"Variance of X2:"</span><span class="p">,</span>
</span><span class="line">                min<span class="o">=</span><span class="m">1</span><span class="p">,</span> max<span class="o">=</span><span class="m">10</span><span class="p">,</span> value<span class="o">=</span><span class="m">1</span><span class="p">)</span>
</span><span class="line">    <span class="p">),</span>
</span><span class="line">  mainPanel<span class="p">(</span>plotOutput<span class="p">(</span><span class="s">"plot"</span><span class="p">)</span>
</span><span class="line">            <span class="p">)</span>
</span><span class="line">  <span class="p">))</span>
</span>

Everything’s ready to fork or clone on Github.

To leave a comment for the author, please follow the link and comment on their blog: Ecology in silico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)