Using Optmatch and RItools for Observational Studies

July 29, 2010

(This article was first published on Mark M. Fredrickson, and kindly contributed to R-bloggers)

I am a contributor to the optmatch and the RItools packages for R. These two packages are separate, but complimentary. Both packages provide tools for adjusting observational data to exhibit “balance” on observed covariates. In a randomized control trial, treatment and control groups should have identical distributions over all covariates, observed and unobserved. Matching provides a method to create smaller groups in an observational study that are similarly balanced. Balance can be quantified so that alternative matches can be compared. When an acceptable match has been found, analysis can then proceed as if nature provided a blocked, randomized study.


Both optmatch and RItools use a canonical dataset consisting of nuclear plants. From help(nuclearplants):

The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960’s and early 1970’s. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers’ subsidies may be hidden in the quoted capital costs.

With these data, we may wish to know if certain variables lead to higher or lower construction costs. One particular variable is pr, an indicator if a previous lightwater reactor at the same location was present. Such an installation might significantly increase or decrease costs. The rest of this document uses matching and balance testing to provide an answer to just that question.

I start by loading the packages and the data:

> library(optmatch)
> library(RItools)
> data(nuclearplants)

Before getting into the matching process, lets take a quick look at the balance of the data on all variables, except cost and pr, the LWR indicator. xBalance, among other tests, provides an omnibus balance test across any number of variables. This test compares the null hypothesis of “the data are balanced” against the alternative hypothesis of a lack of balance, where balance is what we would expect in a randomized trial with the same sample size. The test follows a chi-squared distribution, which xBalance will happily report:

> xBalance(pr ~ . - (cost + pr), 
  data = nuclearplants, 
  report = c("chisquare.test"))

---Overall Test---
        chisquare df p.value
unstrat      12.4  9   0.192
Signif. codes:  0 ‘***’ 0.001 ‘** ’ 0.01 ‘*  ’ 0.05 ‘.  ’ 0.1 ‘   ’ 1 

With a reported p-value of 0.19, the balance of this sample is not terrible (by conventional levels of hypothesis testing), but we might prefer something closer to 1. While there is no a priori p-value we should prefer, experience indicates that p-values in the neighborhood of .5 are achievable and mimic true randomized designs (though optimal balance levels are a subject of ongoing research).


A full discussion of matching procedures is beyond the scope of this document (see Rosenbaum (2010) for a more comprehensive discussion). In brief, matching attempts to group units with similar covariates, as if they had been blocked in a randomized experiment. The optimal match would be two units identical on every variable, observed and unobserved. In most datasets, no two units will be identical on all observed covariates. Instead, we can use a measure that summarizes all covariates and match based on the summary. The propensity score, the probability of receiving treatment given the observed covariates, has been a popular summary measure (for more on the theory, see Rosenbaum and Rubin (1983)). I’ll use a logistic regression to estimate the propensity scores of my observations, using a subset of the available variables:

> model <- glm(pr ~ t1 + t2 + cap, 
  family = binomial(), data = nuclearplants)

With a propensity model, optmatch provides several functions for computing matched sets of observations. The fullmatch function takes a treatment by control matrix containing distances between observations and returns a factor indicating the set membership, if any, of all observations. Computing the distance matrix is simple using the mdist function. This function takes a linear model, a function, or a formula to produce distances based on propensity models, aribtrary user functions, or Mahalanobis distances between observations. We’ll use the propensity model. See the help page for mdist for the other alternatives.

m1 <- fullmatch(mdist(model))

We can compare the first match with a second, in which a caliper is placed on the date variable. This will constrain the matching algorithm, disallowing matches on observations with widely differing date values, even if the over all propensity scores are similar. Calipers can lead to poorer matches on observed variables but provide a method by which researchers can include subject matter information in the matching process. For example, if the cost of construction decreased over time due to increased efficiency in construction practices.

> m2 <- fullmatch(mdist(model) + 
  caliper(0.25, pr ~ date, data = nuclearplants))

Balance Testing

With two possible matches, do either produce adequate balance? As noted previously, the RItools package provides a method of quantifying balance in a matched set. The method (discussed in detail in Hansen and Bowers (2008)) compares treatment and control units within each block on a difference of means for each variable. Combining the these differences follows a chi-squared distribution. We can compare all the matches at the same time, along with the raw data (see the strata argument).

> (allbalance <- xBalance(pr ~ . - (cost + pr), 
    data = nuclearplants, 
    report = c("chisquare.test", "std.diffs"), 
    strata = data.frame(original = factor("none"), m1, m2)))

      strata original                m1                m2         
      stat   std.diff          std.diff          std.diff         
date         -0.11468          -0.23368          0.06902          
t1           0.10630           -0.01666          0.30232          
t2           1.03269  *        0.27487           0.65635          
cap          0.34012           -0.03631          0.24860          
ne           -0.16312          -0.47647          -0.13433         
ct           -0.30797          -0.65565          -0.78858 *       
bw           0.04511           0.29570           -0.20169         
cum.n        -0.09760          -0.00887          -0.16724         
pt           0.41382           0.60274           0.00000          
---Overall Test---
         chisquare df p.value
original     12.39  9   0.192
m1            5.15  9   0.821
m2           10.07  9   0.345
Signif. codes:  0 ‘***’ 0.001 ‘** ’ 0.01 ‘*  ’ 0.05 ‘.  ’ 0.1 ‘   ’ 1 

Both matches provide good balance. With a value of 0.821 we might be tempted to prefer the unconstrained match; however, with a p-value of 0.345, the match with a caliper also provides reasonable assurances of balance. As either provides plausible balance, researchers might choose to concentrate on substantively important covariates. When xBalance reports “std.diffs” (as above), we can plot the result to get a visual picture of balance on each covariate.

> plot(allbalance)

Covariate balance


Since we now have data that approximates a randomized experiment, we can use the same techniques to analyze this data as any blocked randomized experiment. For example, one-way ANOVA using pr as the treatment factor and m1 as the blocking factor.

> anova(lm(nuclearplants$cost ~ nuclearplants$pr + m1))
Analysis of Variance Table

Response: nuclearplants$cost
                 Df Sum Sq Mean Sq F value Pr(>F)
nuclearplants$pr  1   9037    9037  0.3394 0.5654
m1                5 222410   44482  1.6704 0.1785
Residuals        25 665726   26629               

Under conventional levels, we do not observe either the treatment or the blocking factor reach statistical significance. So we can conclude that existing lightwater reactors do not have an effect on construction costs that we can differentiate from chance.


In the analysis, I chose one of two plausible matches. It so happened that I selected the match with the larger p-value. Does this indicate that we should select the match with the highest p-value, as it most closely approximates a randomly allocated treatment? I would caution against that conclusion. Within the set of matches that are plausibly balanced, it is difficult to argue that one match is truly better than another. While in expectation, randomized treatments are perfectly balanced, in pratice, small deviations should be expected (with fewer deviations in larger experimental populations).

In short, don’t sweat the small stuff. Find a reasonable match and go with it. In fact, you may find that matches with lower p-values provide interesting substantive results. Here is an analysis of the second match, which included a caliper on the date of construction:

> anova(lm(nuclearplants$cost ~ nuclearplants$pr + m2))
Analysis of Variance Table

Response: nuclearplants$cost
                 Df Sum Sq Mean Sq F value  Pr(>F)  
nuclearplants$pr  1   4185    4185  0.2035 0.65654  
m2                8 396199   49525  2.4080 0.05098 .
Residuals        21 431905   20567                  
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

This matching indicates a significant blocking effect, which suggests that limiting matches by date may have something to do with the resulting costs. If we had blindly pursued higher p-value matches, we might not have observed this interesting result.

To leave a comment for the author, please follow the link and comment on their blog: Mark M. Fredrickson. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)