Bias in Observational Studies – Sensitivity Analysis with R package episensr

[This article was first published on vet epi » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When it’s time to interpret the study results from your observational study, you have to estimate if the effect measure you obtained is the truth, if it’s due to bias (systematic error, the effect measure’s precision), or if it’s due to chance (random error, the effect measure’s validity) (Rothman and Greenland, 2008, pp115-134). Every study has some random error due to its limited sample size, and is susceptible to systematic errors as well, from selection bias to the presence of (un)known confounders or information bias (measurement error, including misclassification).

Bias analysis, or sensitivity analysis, tries to quantify the direction, magnitude, and uncertainty of the bias affecting an estimate of association (Greenland and Lash, 2008, pp345-380; Lash et al., 2009).

Very often bias is evaluated qualitatively without any quantitative assessment of its magnitude. This might be due to the very few tools available to epidemiologists. Hence this R package, episensr, which is built following the book by Lash et al. I will illustrate its use with 3 studies from this book.

First you have to install and load the package.

install.packages("episensr")
# or get it from Github repo with
# library(devtools)
# install_github("dhaine/episensr")

library(episensr)

Selection Bias

We will use a case-control study by Stang et al. on the relation between mobile phone use and uveal melanoma. The observed odds ratio for the association between regular mobile phone use vs. no mobile phone use with uveal melanoma incidence is 0.71 [95% CI 0.51-0.97]. But there was a substantial difference in participation rates between cases and controls (94% vs 55%, respectively) and so selection bias could have an impact on the association estimate. The 2X2 table for this study is the following:

Regular Use No Use
Cases 136 107
Controls 297 165

We use the function selection as shown below.

selection(matrix(c(136, 107, 297, 165),
                 dimnames = list(c("UM+", "UM-"), c("Mobile+", "Mobile-")),
                 nrow = 2, byrow = TRUE),
          selprob = c(.94, .85, .64, .25))

Observed Data:
---------------------------------------------------
Outcome   : UM+
Comparing : Mobile+ vs. Mobile- 

    Mobile+ Mobile-
UM+     136     107
UM-     297     165

Data Corrected for Selected Proportions:
---------------------------------------------------

     Mobile+  Mobile-
UM+ 144.6809 125.8824
UM- 464.0625 660.0000

                               95% conf. interval
Observed Relative Risk: 0.7984    0.6518   0.9780
   Observed Odds Ratio: 0.7061    0.5144   0.9693

                                          [,1]
Selection Bias Corrected Relative Risk: 1.4838
   Selection Bias Corrected Odds Ratio: 1.6346

                                                [,1]
     Selection probability among cases exposed: 0.94
   Selection probability among cases unexposed: 0.85
  Selection probability among noncases exposed: 0.64
Selection probability among noncases unexposed: 0.25

The 2X2 table is provided as a matrix and selection probabilities given with the argument selprob, a vector with the 4 probabilities (guided by the participation rates in cases and controls) in the following order: among cases exposed, among cases unexposed, among noncases exposed, and among noncases unexposed. The output shows the observed 2X2 table, the same table corrected for the selection proportions, the observed odds ratio (and relative risk) followed by the corrected ones, and the input parameters.

Uncontrolled Confounders

We will use date from a cross-sectional study by Tyndall et al. on the association between male circumcision and the risk of acquiring HIV, which might be confounded by religion. The code to account for unmeasured or unknown confounders is the following, where the 2X2 table is given as a matrix. We choose a risk ratio implementation, provide a vector defining the prevalence of the confounder, religion, among the exposed and the unexposed, and give the risk ratio associating the confounder with the disease.

confounders(matrix(c(105, 85, 527, 93),
                   dimnames = list(c("HIV+", "HIV-"), c("Circ+", "Circ-")),
                   nrow = 2, byrow = TRUE),
            implement = "RR",
            p = c(.8, .05),
            RR.cd = .63)

Observed Data:
--------------
Outcome   : HIV+
Comparing : Circ+ vs. Circ- 

     Circ+ Circ-
HIV+   105    85
HIV-   527    93

Data, Counfounder +:
--------------------

        Circ+ Circ-
HIV+  75.1705 2.728
HIV- 430.4295 6.172

Data, Counfounder -:
--------------------

       Circ+  Circ-
HIV+ 29.8295 82.272
HIV- 96.5705 86.828

Crude and Unmeasured Confounder Specific Measures of Exposure-Outcome Relationship:
-----------------------------------------------------------------------------------

                                    95% conf. interval
        Crude Relative Risk: 0.3479    0.2757    0.439
Relative Risk, Confounder +: 0.4851
Relative Risk, Confounder -: 0.4851 

Exposure-Outcome Relationship Adjusted for Confounder:
------------------------------------------------------

Standardized Morbidity Ratio     SMRrr: 0.4851    RR adjusted using SMR estimate: 0.7173
Mantel-Haenszel                   MHrr: 0.4851     RR adjusted using MH estimate: 0.7173 

Bias Parameters:
----------------

p(Confounder+|Exposure+): 0.8
p(Confounder+|Exposure-): 0.05
  RR(Confounder-Outcome): 0.63

The output gives the crude 2X2 table, the 2X2 tables by levels of the confounder, the crude relative risk and confounder specific measures of association between exposure and outcome, and the relationship adjusted for the unknown confounder, using a standardized morbidity ratio (SMR) or a Mantel-Haenszel (MH) estimate of the risk ratio. Finally, the bias parameters are shown.

Probabilistic Sensitivity Analysis for Exposure Misclassification

We use a study on the effect of smoking during pregnancy on breast cancer risk (Fink and Lash), where we assume nondifferential misclassification of the exposure, smoking, with probability density functions for sensitivities (Se) and specificities (Sp) among cases and noncases equal to uniform distributions with a minimum of 0.7 and a maximum of 0.95 for sensitivities (0.9 and 0.99 respectively for specificities). As usual, the 2X2 table is provided as a matrix. We choose to correct for exposure misclassification with the argument type = exposure. We ask for 10000 replications (default is 1000). The Se and Sp for cases (seca, spca) are given as a list with its first element referring to the type of distribution (choice between uniform, triangular and trapezoidal) and the second element giving the distribution parameters (min and max for uniform distribution). By avoiding to provide information on the noncases (seexp, spexp), we are referring to a nondifferential misclassification.

smoke.nd <- probsens(matrix(c(215, 1449, 668, 4296),
                            dimnames = list(c("BC+", "BC-"), c("Smoke+", "Smoke-")),
                            nrow = 2, byrow = TRUE),
                     type = "exposure",
                     reps = 10000,
                     seca.parms = list("uniform", c(.7, .95)),
                     spca.parms = list("uniform", c(.9, .99)))

Observed Data:
--------------
Outcome   : BC+
Comparing : Smoke+ vs. Smoke- 

    Smoke+ Smoke-
BC+    215   1449
BC-    668   4296

Observed Measures of Exposure-Outcome Relationship:
-----------------------------------------------------------------------------------

                                95% conf. interval
 Observed Relative Risk: 0.9654    0.8524   1.0934
    Observed Odds Ratio: 0.9542    0.8092   1.1252

                                              Median 2.5th percentile
           Relative Risk -- systematic error: 0.9432           0.8816
              Odds Ratio -- systematic error: 0.9254           0.8477
Relative Risk -- systematic and random error: 0.9372           0.8181
   Odds Ratio -- systematic and random error: 0.9178           0.7671
                                              97.5th percentile
           Relative Risk -- systematic error:            0.9612
              Odds Ratio -- systematic error:            0.9488
Relative Risk -- systematic and random error:            1.0662
   Odds Ratio -- systematic and random error:            1.0884

Bias Parameters:
----------------

   Se|Cases: uniform ( 0.7 0.95 )
   Sp|Cases: uniform ( 0.9 0.99 )
Se|No-cases: ( )
Sp|No-cases: ( )

The output gives the 2X2 table, the observed measures of association, the corrected measures of association, and the input bias parameters.

We saved the probsens analysis in a new variable smoke.nd. We can see the element of the object probsens with the function str():

str(smoke.nd)

List of 4
 $ obs.data    : num [1:2, 1:2] 215 668 1449 4296
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "BC+" "BC-"
  .. ..$ : chr [1:2] "Smoke+" "Smoke-"
 $ obs.measures: num [1:2, 1:3] 0.965 0.954 0.852 0.809 1.093 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] " Observed Relative Risk:" "    Observed Odds Ratio:"
  .. ..$ : chr [1:3] "     " "95% conf." "interval"
 $ adj.measures: num [1:4, 1:3] 0.943 0.925 0.937 0.918 0.882 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:4] "           Relative Risk -- systematic error:" "              Odds Ratio -- systematic error:" "Relative Risk -- systematic and random error:" "   Odds Ratio -- systematic and random error:"
  .. ..$ : chr [1:3] "Median" "2.5th percentile" "97.5th percentile"
 $ sim.df      :'data.frame':	10000 obs. of  12 variables:
  ..$ seca   : num [1:10000] 0.878 0.75 0.928 0.796 0.841 ...
  ..$ seexp  : num [1:10000] 0.878 0.75 0.928 0.796 0.841 ...
  ..$ spca   : num [1:10000] 0.958 0.948 0.971 0.906 0.969 ...
  ..$ spexp  : num [1:10000] 0.958 0.948 0.971 0.906 0.969 ...
  ..$ A1     : num [1:10000] 173.5 183.1 185.5 82.6 201.6 ...
  ..$ B1     : num [1:10000] 1490 1481 1479 1581 1462 ...
  ..$ C1     : num [1:10000] 550 584 583 284 634 ...
  ..$ D1     : num [1:10000] 4414 4380 4381 4680 4330 ...
  ..$ corr.RR: num [1:10000] 0.951 0.944 0.957 0.891 0.955 ...
  ..$ corr.OR: num [1:10000] 0.935 0.927 0.943 0.86 0.941 ...
  ..$ tot.RR : num [1:10000] 0.917 0.855 0.895 0.882 0.883 ...
  ..$ tot.OR : num [1:10000] 0.892 0.813 0.863 0.848 0.848 ...

smoke.nd is a list of 4 elements where different information on the analysis done are saved. We have smoke.nd$obs.data where we have the observed 2X2 table, smoke.nd$obs.measures (the observed measures of association), smoke.nd$adj.measures (the adjusted measures of association), and smoke.nd$sim.df, a data frame with the simulated variables from each replication, like the Se and Sp, the 4 cells of the adjusted 2X2 table, and the adjusted measures. We can plot the Se prior distribution (not forgetting to discard the draws that led to negative adjustments).

hist(smoke.nd$sim.df[!is.na(smoke.nd$sim.df$corr.RR), ]$seca,
     breaks = seq(0.65, 1, 0.01),
     col = "lightgreen",
     main = NULL,
     xlab = "Sensitivity for Cases")
Histogram of 10000 draws from uniform distribution for Se among cases
Histogram of 10000 draws from uniform distribution for Se among cases

 

Additional functions can be found on the reference manual and more will be added in the coming months.

To leave a comment for the author, please follow the link and comment on their blog: vet epi » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)