Using xBalance with MatchIt

August 1, 2010
By

(This article was first published on Mark M. Fredrickson, and kindly contributed to R-bloggers)

In a previous post, I demonstrated how to create a propensity score matching, test balance, and analyze the outcome variable using the optmatch and RItools packages. The same strategy can be used with other matching algorithms, for example the various methods included in the MatchIt package.

I’ll use the same basic question and data from my previous article. The MatchIt package wraps optmatch to provide its “full” and “optimal” matching methods, so I will the “full” option to maintain consistency with my previous article. The first step is loading the packages and the data:

> library(MatchIt)
> library(optmatch)
> library(RItools)
> data(nuclearplants)

The interface for MatchIt is similar to optmatch for propensity score matches, except that the matchit() function compresses the process into a single step of specifying the propensity formula and producing the match, while fullmatch() allows a user to specify any number of distance matrices. In the end, the interface is fairly similar. As with the previous article, I match on a subset of the covariates.

> example.formula <- formula(pr ~ t1 + t2 + cap)
> match.opt <- fullmatch(
                      mdist(glm(example.formula, 
                                data = nuclearplants, 
                                family = binomial())))

> all.mit <- matchit(example.formula, 
                          data = nuclearplants, 
                          method = "full")

The all.mit object contains (among other items) a vector indicating each object’s matched set. For compatibility, save it as a factor:

> match.mit <- as.factor(all.mit$subclass)

Unsurprisingly, as MatchIt uses optmatch the two matches are identical.

> lapply(split(nuclearplants, match.opt), rownames)


$m.1
[1] "N" "Z" "a"

$m.10
[1] "I" "G"

$m.2
[1] "A" "B" "D" "V" "F" "b"

$m.5
[1] "U" "c"

$m.6
 [1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"

$m.8
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"


> lapply(split(nuclearplants, match.mit), rownames)


$`1`
[1] "N" "Z" "a"

$`2`
[1] "I" "G"

$`3`
[1] "A" "B" "D" "V" "F" "b"

$`4`
[1] "U" "c"

$`5`
 [1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"

$`6`
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"

Now that I have a factor listing the groups, I can run xBalance to assess the balance properties of the match:

> xBalance(pr ~ . - (cost + pr), 
              data = nuclearplants, 
              strata = match.mit, 
              report = "chisquare.test")


---Overall Test---
      chisquare df p.value
strat       5.1  9    0.82
---
Signif. codes:  0 ‘***’ 0.001 ‘** ’ 0.01 ‘*  ’ 0.05 ‘.  ’ 0.1 ‘   ’ 1 

With a reported p-value of 0.82, there is little evidence against the null of balance, so we would fail to reject it.

This walk through used the the “full” method for matchit(), but the same techniques will work with other matchit() methods, such as coarsened exact matching or nearest neighbor. If you are reasonably confident that you wish to use optimal matching, you should consider using the optmatch package directly, instead of using it through MatchIt. In future posts I will be demonstrating important techniques to speed up the matching process (which can be a great benefit to large datasets) and how you can create matches that incorporate more subject matter information than can be included in a simple logit model.

To leave a comment for the author, please follow the link and comment on his blog: Mark M. Fredrickson.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.