**Mark M. Fredrickson**, and kindly contributed to R-bloggers)

I am a contributor to the optmatch and the RItools packages for R. These two packages are separate, but complimentary. Both packages provide tools for adjusting observational data to exhibit “balance” on observed covariates. In a randomized control trial, treatment and control groups should have identical distributions over all covariates, observed and unobserved. Matching provides a method to create smaller groups in an observational study that are similarly balanced. Balance can be quantified so that alternative matches can be compared. When an acceptable match has been found, analysis can then proceed as if nature provided a blocked, randomized study.

## Data

Both `optmatch`

and `RItools`

use a canonical dataset consisting of nuclear plants. From `help(nuclearplants)`

:

The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960’s and early 1970’s. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers’ subsidies may be hidden in the quoted capital costs.

With these data, we may wish to know if certain variables lead to higher or lower construction costs. One particular variable is `pr`

, an indicator if a previous lightwater reactor at the same location was present. Such an installation might significantly increase or decrease costs. The rest of this document uses matching and balance testing to provide an answer to just that question.

I start by loading the packages and the data:

```
```> library(optmatch)
> library(RItools)
> data(nuclearplants)

Before getting into the matching process, lets take a quick look at the balance of the data on all variables, except `cost`

and `pr`

, the LWR indicator. `xBalance`

, among other tests, provides an omnibus balance test across any number of variables. This test compares the null hypothesis of “the data are balanced” against the alternative hypothesis of a lack of balance, where balance is what we would expect in a randomized trial with the same sample size. The test follows a chi-squared distribution, which `xBalance`

will happily report:

```
```> xBalance(pr ~ . - (cost + pr),
data = nuclearplants,
report = c("chisquare.test"))
---Overall Test---
chisquare df p.value
unstrat 12.4 9 0.192
---
Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1

With a reported p-value of 0.19, the balance of this sample is not terrible (by conventional levels of hypothesis testing), but we might prefer something closer to 1. While there is no *a priori* p-value we should prefer, experience indicates that p-values in the neighborhood of .5 are achievable and mimic true randomized designs (though optimal balance levels are a subject of ongoing research).

## Matching

A full discussion of matching procedures is beyond the scope of this document (see Rosenbaum (2010) for a more comprehensive discussion). In brief, matching attempts to group units with similar covariates, as if they had been blocked in a randomized experiment. The optimal match would be two units identical on every variable, observed and unobserved. In most datasets, no two units will be identical on all observed covariates. Instead, we can use a measure that summarizes all covariates and match based on the summary. The propensity score, the probability of receiving treatment given the observed covariates, has been a popular summary measure (for more on the theory, see Rosenbaum and Rubin (1983)). I’ll use a logistic regression to estimate the propensity scores of my observations, using a subset of the available variables:

```
```> model <- glm(pr ~ t1 + t2 + cap,
family = binomial(), data = nuclearplants)

With a propensity model, `optmatch`

provides several functions for computing matched sets of observations. The `fullmatch`

function takes a treatment by control matrix containing distances between observations and returns a factor indicating the set membership, if any, of all observations. Computing the distance matrix is simple using the `mdist`

function. This function takes a linear model, a function, or a formula to produce distances based on propensity models, aribtrary user functions, or Mahalanobis distances between observations. We’ll use the propensity model. See the help page for `mdist`

for the other alternatives.

m1 <- fullmatch(mdist(model))

We can compare the first match with a second, in which a caliper is placed on the date variable. This will constrain the matching algorithm, disallowing matches on observations with widely differing date values, even if the over all propensity scores are similar. Calipers can lead to poorer matches on observed variables but provide a method by which researchers can include subject matter information in the matching process. For example, if the cost of construction decreased over time due to increased efficiency in construction practices.

```
```> m2 <- fullmatch(mdist(model) +
caliper(0.25, pr ~ date, data = nuclearplants))

## Balance Testing

With two possible matches, do either produce adequate balance? As noted previously, the `RItools`

package provides a method of quantifying balance in a matched set. The method (discussed in detail in Hansen and Bowers (2008)) compares treatment and control units within each block on a difference of means for each variable. Combining the these differences follows a chi-squared distribution. We can compare all the matches at the same time, along with the raw data (see the `strata`

argument).

```
```> (allbalance <- xBalance(pr ~ . - (cost + pr),
data = nuclearplants,
report = c("chisquare.test", "std.diffs"),
strata = data.frame(original = factor("none"), m1, m2)))
strata original m1 m2
stat std.diff std.diff std.diff
vars
date -0.11468 -0.23368 0.06902
t1 0.10630 -0.01666 0.30232
t2 1.03269 * 0.27487 0.65635
cap 0.34012 -0.03631 0.24860
ne -0.16312 -0.47647 -0.13433
ct -0.30797 -0.65565 -0.78858 *
bw 0.04511 0.29570 -0.20169
cum.n -0.09760 -0.00887 -0.16724
pt 0.41382 0.60274 0.00000
---Overall Test---
chisquare df p.value
original 12.39 9 0.192
m1 5.15 9 0.821
m2 10.07 9 0.345
---
Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1

Both matches provide good balance. With a value of 0.821 we might be tempted to prefer the unconstrained match; however, with a p-value of 0.345, the match with a caliper also provides reasonable assurances of balance. As either provides plausible balance, researchers might choose to concentrate on substantively important covariates. When `xBalance`

reports “std.diffs” (as above), we can plot the result to get a visual picture of balance on each covariate.

```
```> plot(allbalance)

## Analysis

Since we now have data that approximates a randomized experiment, we can use the same techniques to analyze this data as any blocked randomized experiment. For example, one-way ANOVA using `pr`

as the treatment factor and `m1`

as the blocking factor.

```
```> anova(lm(nuclearplants$cost ~ nuclearplants$pr + m1))
Analysis of Variance Table
Response: nuclearplants$cost
Df Sum Sq Mean Sq F value Pr(>F)
nuclearplants$pr 1 9037 9037 0.3394 0.5654
m1 5 222410 44482 1.6704 0.1785
Residuals 25 665726 26629

Under conventional levels, we do not observe either the treatment or the blocking factor reach statistical significance. So we can conclude that existing lightwater reactors do not have an effect on construction costs that we can differentiate from chance.

## Conclusion

In the analysis, I chose one of two plausible matches. It so happened that I selected the match with the larger p-value. Does this indicate that we should select the match with the highest p-value, as it most closely approximates a randomly allocated treatment? I would caution against that conclusion. Within the set of matches that are plausibly balanced, it is difficult to argue that one match is truly better than another. While in expectation, randomized treatments are perfectly balanced, in pratice, small deviations should be expected (with fewer deviations in larger experimental populations).

In short, don’t sweat the small stuff. Find a reasonable match and go with it. In fact, you may find that matches with lower p-values provide interesting substantive results. Here is an analysis of the second match, which included a caliper on the date of construction:

```
```> anova(lm(nuclearplants$cost ~ nuclearplants$pr + m2))
Analysis of Variance Table
Response: nuclearplants$cost
Df Sum Sq Mean Sq F value Pr(>F)
nuclearplants$pr 1 4185 4185 0.2035 0.65654
m2 8 396199 49525 2.4080 0.05098 .
Residuals 21 431905 20567
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This matching indicates a significant blocking effect, which suggests that limiting matches by date may have something to do with the resulting costs. If we had blindly pursued higher p-value matches, we might not have observed this interesting result.

**leave a comment**for the author, please follow the link and comment on their blog:

**Mark M. Fredrickson**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...