concerning Causal Mediation Analysis

I am happy to join Imai, Keele, Tingley, and Yammamoto (henceforth Imai-et al.) in celebrating the full convergence of our respective analyses towards a uniﬁed understanding of causal mediation. I am referring to the analysis presented in (Pearl, 2001) (reproduced in (Pearl, 2013)) on the one hand, and the analyses and implementations of (Imai et al., 2010a,b,c), on the other. In fact, when I ﬁrst read (Imai et al., 2010c), I had no doubt that, despite some dissimilarities in the presentation of the assumptions, the two works would coincide on all fronts: Deﬁnitions, basic assumptions, identiﬁcation and estimation algorithms. The reasons for my conﬁdence was that, in 2001, I approached the mediation problem from the symbiotic mathematical framework of Structural Causal Models (SCM) (Pearl, 2000, Chapter 7) which uniﬁes the graphical, potential outcomes and structural equation frameworks, and according to which, the latter two are logical equivalent; a theorem in one is a theorem in the other. They diﬀer only in the language in which assumptions are cast. This means that even researchers who accept no other interpretation of causation except the one dictated by orthodox potential outcomes can safely use the transparency and inferential power provided by the symbiotic framework, and be assured the validity of the results. Inspired by this assurance, I derived identiﬁcation conditions in the algebra of counterfactuals and presented them in two languages, counterfactual (or potential outcomes) and graphical. Not surprisingly, the mediation formulas derived in Imai et al. (2010c) coincide precisely with those derived in Pearl (2001, Eqs. (8), (17), (26), (27)). This is to be expected, since

The assumptions posited in Imai et al. (2010c) added two restrictions to those articulated in (Pearl, 2001): 1. Commence the analysis with two assumptions of sequential ignorability (B-1 and B-2 in Pearl (2013)). (The latter is automatically satisfied in randomized studies.) 2. Satisfy these two assumptions with the same set (W ) of observed covariance.
Clearly, all identification results produced under these restrictions will be valid in the symbiotic system of SCM (Pearl, 2001), in which these restrictions were not imposed.
In (Pearl, 2013) I identify the set of circumstances where these two added restrictions lead to missed opportunities, and the current commentary by Imai-et al. identify conditions under which the added restrictions will cause no practical loss of opportunities. The two studies complement each other and provide valuable information; they tell us when the inference systems of (Imai et al., 2010a,b,c) operate in perfect harmony with the symbiotic methodology presented in (Pearl, 2001).
Specifically, Imai-et al. show that the restrictions imposed by sequential ignorability play a role only in observational studies, but not in studies where treatment is randomized. Additionally, the extra-restriction of conditioning on the same set of covariates may not be too severe in certain observational studies. I concur with most of these observations, and commend Imai-et al. for bringing them to readers attention.
I cannot accept, however, Imai-et al.'s conclusion that: "Including irrelevant covariates may complicate the modeling but does not compromise the identification of causal mediation effects under the as-if randomization assumption." Whether the covariates considered are relevant or irrelevant depends on whether the "as-if randomization assumption" holds after their inclusion, which makes the sentence above circular, if not contradictory. Researchers should choose "relevant covariates" so as to make the "as-if randomization assumption" hold, not the other way around. 1 Although the "as-if randomization assumption" can be articulated succinctly in the language of "ignorability," its validity may depend on many other assumptions encoded in the model, hence no mortal can judge its plausibility without the aid of graphs. Fortunately, the graphical methods presented in my paper (Pearl, 2013) allow us to mechanize the choice of the "relevant covariates," and I hope Imai-et al. can implement this procedure in their flexible software.
In the remaining of this note, I concentrate on an issue that is common to all players in the causal mediation analysis. It concerns ways of improving the understanding of causal mediation among the uninitiated.
Impediments to such understanding come from several research communities.
1. Potential outcomes enthusiasts reject mediation when the mediator is non-manipulable.
2. Traditional statisticians fear that, without extensive reading of the philosophical writings of Aristotle, Kant and Hume, they are not well equipped to tackle the subject of causation, especially when it involves claims based on untested assumptions.
3. Traditional mediation analysts do not understand the sudden intrusion of counterfactuals into their field, which thus far has been dominated by regression analysis.
4. Economists, who adore counterfactuals (though find difficulties defining them (Pearl, 2009a, p. 379)) are not convinced that mediation analysis could help policy makers.
I will address the third group, namely, the traditional mediation analysts usually connected with the school of Baron and Kenny (BK) (1986), since the difficulties faced by this school are endemic of other groups as well, and constitute the key impediment to a wider acceptance of causal mediation. As traditionalists examine modern definitions of direct and indirect effects (e.g., Pearl, 2013, Eqs. (7)-(10)) the first thing that strikes them odd is the absence of a conditioning operator in any of these definitions. Whereas in the linear SEM tradition "effects" are associated with conditional expectations or regression slopes defined by holding some variables constants, here, we plug the value of the variables we wish to keep constant (or "control for") directly into the equation (or into the subscript of a counterfactual), but we never place that variable behind a conditioning bar. In other words, we write Readers versed in the distinction between "seeing" vs. "doing" (Lindley, 2002;Pearl, 1993;Pearl, 2009a, pp. 421-428;Spirtes et al., 1993) or "controlling for" vs. "setting" will recognize immediately that, in mediation, the proper operator is "doing," not "seeing" and that it is this difference that gives causal mediation analysis a claim to the title "causal." Most traditionalists, however, are not attuned to this distinction and, when presented with the modern definitions of direct and indirect effect tend to voice skepticism: "Do we really need those counterfactuals?" or "Do we really need to treat a structural equation in this manner? Why not condition on M = m?" The urge to condition on variables held constants is in fact so intense that I hold it accountable for a century of blunders and confusions; from "probabilistic causality" (Pearl, 2011b;Suppes, 1970) to "evidential decision theory (Jeffrey, 1965;Pearl, 2009a, pp. 108-109) and Simpson's paradox (Pearl, 2009a, pp. 173-180); from Fisher's error in handling mediation (Fisher, 1935;Rubin, 2005) to "Principal Stratification" mishandling of mediation (Pearl, 2011a;Rubin, 2004); from misinterpretations of structural equations (Freedman, 1987;Hendry, 1995;Holland, 1995;Pearl, 2009a, pp. 135-138;Sobel, 2008;Wermuth, 1992) to the structural-regressional confusion in econometric textbooks today (Chen and Pearl, 2013).
What caused this confusion, and how it enters the world of mediation? The urge to condition stems from the absence of probabilistic notation for the notion of "holding T constant," which has forced generations of statisticians to use a surrogate in the form of "conditioning on T "; the only surrogate licensed to them by probability theory.
The history of mediation analysis offers a compelling narrative on why the conditioning habit took roots, and why it should be uprooted.
Examine the basic mediation model ( Fig. 1(a)) with M (partially) mediating between T and Y . Why are we tempted to "control" for M when we wish to estimate the direct effect of T on Y ? The reason is that, if we succeed in preventing M from changing then whatever changes we measure in Y would be attributable solely to variations in T and we would be justified then in proclaiming the response observed as "direct effect of T on Y ." Unfortunately, the language of probability theory does not possess the notation to express the idea of "preventing M from changing" or "physically holding M constant." The only operator probability allows us to use is "conditioning" which is what we do when we "control for M " in the conventional way. In other words, instead of physically holding M constant (say at M = m) and comparing Y for units under T = 1 to those under T = 0, we allow M to vary but ignore all units except those in which M achieves the value M = m. Students of causality know that these two operations are profoundly different, and give totally different results, except in the case of no omitted variables. Yet to most traditionalists, this would come as a total surprise, and would elicit requests for explicit demonstration. Stunned by the cultural divide between the two camps, and having not found a convincing demonstration in the literature, 2 I believe it is appropriate to provide one at this commentary; it is absolutely pivotal to the understanding of causal mediation.
Assume that there is a latent variable L causing both M and Y and, to simplify the discussion, assume that the structural equations are Y = 0 · T + 0 · M + L and M = T + L as shown in Fig. 1(b). Obviously, the direct effect of T on Y in this case is zero, but this is not what we would get if we "control for M " and compare subjects under T = 1 to those under T = 0 at the same time level of M = 0. In the former group we would find Y = L = M − T = 0 − 1 = −1 whereas in the latter group we would find Y = L = M − T = 0 − 0 = 0. In other words, in order to keep the same score of M = 0 for the two groups, L had to change from L = −1 to L = 0. Thus, we are comparing apples and oranges (i.e., subjects for which L = −1 to those with L = 0) and, not surprisingly, we obtain an erroneous estimate of (−1) for a direct effect that, in reality is zero. Now let us examine what we obtain from the counterfactual expression for M = 0 (same for M = 1). Substituting the structural equation for the counterfactuals, we get as expected. The reason we obtained the correct result is that we simulated correctly what we set out to do, namely, to physically hold M constant, rather than "conditioning on M ." In the former case L remains unchanged, because the physical operation of holding M constant and changing T does not affect L. In the latter, when we "condition" on a constant M, L must compensate for varying T to satisfy the equation M = T + L. In short, counterfactual conditioning reflects a physical intervention while statistical conditioning reflects passive observation. To avoid confusion between the two, I used the notation E[Y |do(T = t)] as distinguished from ordinary conditional expectation, E[Y |T = t] (Pearl, 2009a, Chapter 3). The habit of translating "hold M constant" into "condition on M " became deeply entrenched in the statistical culture (see Lindley, 2002;Pearl, 1993;Spirtes et al., 1993), not by deliberate negligence but due to the coarseness of their language (probability theory) which fails to provide an appropriate operator for "holding M constant." Absent such operator, statisticians (including Fisher (1935)) were pressed to use the only operator available to them: conditioning, and a century of confusion came into being.
Traditional mediation analysts of the BK school were not unaware of the dangers lurking from conditioning Kenny, 1981, 2010). However, lacking an appropriate operator for "fixing M ," they settled on a compromise; they defined direct effect as and accompanied this definition with a warning that it is valid only under the assumption of "no omitted variables." Causal analysis circumvents this compromise upon realizing that the operator needed for "fixing M ," while undefinable in probability theory, is well defined in SEM (Balke and Pearl, 1995;Pearl, 1993), and it permits researchers to express their intent using do(M = m) The formal counterfactual treatment of direct and indirect effects owes its development to this notational provision and to the SEM semantics of counterfactuals.
I believe that, with this narrative in mind, traditional SEM analysts should not have any difficulties accepting the premises of causal mediation. First, these analysts already accept structural equations as the basis for modeling (most statisticians do not). Second, counterfactuals in our narrative enter naturally, as abbreviated structural equations (see Pearl, 2013, Eq. (4)). Third, traditional SEM analysts can easily appreciate the benefits of causal mediation analysis, since it endows them with two new capabilities: 1. Extending mediation analysis to nonlinear functions and highly interactive variables, continuous as well as discrete. 2. Distinguishing between the necessary and sufficient notions of mediation.
I hope this exchange helps clarify the logic and scope of causal mediation analysis as well as the unifying power of the SCM methodology. I thank Imai-et al. for commenting on my paper and contributing to this clarification.