The 3 rules of do-calculus
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Heads up; its more fun to read this blog post if you have seen directed acyclic graphs (DAGs) before, as this blog post won’t provide an introduction to DAGs.
When I started to read up on causal inference during the beginning of my PhD studies, I often got stuck on the assumption of exchanability, i.e., Rubin’s ignorability assumption: . I understood what the assumption means in theory and I understood how to use DAGs to identify confounders and colliders. Intuitively, I understood how the ignorability assumption and DAGs are connected, but I did not understand how they are theoretically connected. I mean, there are usually no counterfactuals in a DAG so how can one use DAGs to reason about whether counterfactuals are independent of the treatment assignment
. One solution is to use singe world interventions graphs (SWIGs), but they never felt really natural to me. Pearl’s do-calculus instead offers a very nice combination of DAGs and the ignorability in my opinion. Hence, I think it is worth taking a closer look at the rules of do-calculus and how they combine the irgnorability assumption and DAGs.
Before we can dive into Pearl’s do-calculus and look at some examples, we first need to introduce a bit of specific notation. First, let ,
,
, and
be a set of unique variables. Second, let
be a directed acyclic graph which is associated with a causal model, let
be a submodel of
in which we remove all arrows going into
, and let
be a submodel of
in which we remove all arrows going out of
. Third, let
define an operator for intervening on
. For example,
indicates the value of
if we would change the value of
to the value
. Lastly, let
denote that
and
are independent of each other.
In his book Causality, Pearl defines the three rules of do-calculus which can be used to identify causal effects with the help of DAGs. The overall aim of do-calculus is to translate expression including do-statements to expression only including observed data. This allows us to identify and later estimate a causal effect using our observed data. Put in other word, using do-calculus, we can translate a causal expression into an expression only including associations which we then can estimate from our observed data. This allows us to interpret association as causation if certain assumptions are fulfilled. Something that previously was only allowed the devil of epidemiological research.
Now you’re ready for the three rules. Listing carefully.
Rule 1 (Insertion/deletion of observations)
In words, this tells us that we can remove a variable from our expression if
is independent of
, given
and potentially other variables
, in the DAG in which we remove all arrows going into
.
Rule 2 (Action/observation exchange)
In word, this tells us that we can replace the action with the variable
observed in the data if
and
are independent, given
and potentially other variables
, in the DAG in which we remove the arrow going into
and out of
. Note that this rule is a generalisation of the back-door criteria which you might now from before. If we are only interested in one action, e.g.,
we can simplify rule 2 as follow:
This now is pretty much an expression of the commonly known back-door criteria.
Rule 3 (insertion/deletion of actions)
where is the set of
-nodes that are not ancestors of any
-node in
.
Last but not least, rule 3 is probably the most complicated one. In words rule 3 tells us that we can remove an expression, e.g., from our expression if
and
are independent, given
and potentially other variables
, in the graph were we remove all arrow going out of
and all nodes of
that are not ancestors of
.
Let’s use these rules of do-calculus for identifying causal effects in some example graphs.




The first example in Figure 1 (a) might seem trivial, but I thought it might be a smooth start. In this graph there are no arrows connecting and
in
, that is, if we remove all the arrows going out of
. Hence,
and
are independent in
, which means that we can apply rule 2 of do-calculus:
Success! Using do-calculus we could replace all the do-statements with observed variables, which now allows us to estimate the causal effect of changing on
based on our observed data. This was quite an easy example. But before we continue with the next example, let’s take a closer look at
again. The reason why we are interested in looking at the graph in which we remove all arrows going out from
is that we want to make sure that
is only affecting
directly or through causes that are caused by
, i.e., we are interested in the total effect of
on
. Thus, if we remove all arrows going out of
or going into
, and we find that in this submodel there is no open causal path between
and
, we can be sure, that in the whole model
, all causal paths between
and
must be direct paths, i.e., paths that we want to include in our estimation.
Figure 1 (b) includes a classical example of confounding, in which the variable confounds the effect of
on
. If we remove all arrows going out of
, we find that
is still associated with
through the fork
. Hence, we cannot directly disentangle the direct effect of
on
and the association between
and
that is due to the confounding of
. However, as stated in rule 2 we can also condition on other variables to render
and
independent in
.
Ok, let’s go through this in more detail. The first step we need to do is to condition our analysis on the variable . This renders
and
independent in
. After this, we can now replace
with
as
and
are independent when conditioning on
.
Figure 1 (c) again is a more simple example. In this graph and
are independent in
because
is a collider on the path
. Hence, we can just calculate
based on our observed data
.
Figure 1 (d) is a tricky one and in contrast to the graphs before, we cannot only rely on rule 2 in order to identify the causal effect of on
. Using only the back-door criteria would not allow us to identify the causal effect of
on
in this graph, but using do-calculus we actually can identify this effect. For this, let’s first take a look at the effect that we would like to estimate:
Unfortunately, we cannot estimate the first part of the right hand hand side directly using only observed data, but we can achieve this with the help of both rule 2 and 3.
Now, we yielded an expression for the first part of the right hand site that only includes observed variables. Let’s do the same for the second part of the right hand side in Equation 1. Translating this part of the equation to an expression, only including observed variables, is actually a lot easier, as is a collider on the path
which renders Z and Y independent in
.
Now, we have all pieces that we need in order to translate Equation 1 into an expression only including observed variables. Let’s substitute Equation 1 with Equation 2 and Equation 3:
Please note that we used in Equation 4 in order to differentiate between the
in
and the
observed in our dataset. The second part of Equation 4 means a summation over all observed values of
independent of the value that is chosen for
.
By the way, if you don’t want to buy Pearl’s causality book, but you’re still interested in reading more about do-calculus, you can find a short introduction to do-calculus by Pearl here. This paper also links to some other interesting applications of do-calculus including, e.g., selection bias and transportability analysis..
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.