Recoverability and Testability of Missing Data: Introduction and Summary of Results

Managing missing data is a problem in every experimental science. Sensors do not always work reliably, respondents do not ll out every question in the questionnaire, and medical patients are often unable to recall episodes, treatments or outcomes. The literature on this problem is huge and has resulted in a powerful software industry that makes missing data packages available through computer programs such as LISREL, M-plus and EQS. The availability of such software has engendered a culture that shares vocabulary, beliefs and expectations and uses common theoretical framework and default assumptions. Most practices are based on the seminal theoretical work of Rubin (Rubin, 1976; Little & Rubin, 2002) who have formulated procedures and conditions under which the damage of missingness can be minimized. This theory has also resulted in a number of performance guarantees when data obey certain statistical conditions. However, the theoretical guarantees provided by this theory are rather coarse, as will be shown in the discussion that follows.


MISSING DATA:
A CAUSAL INFERENCE PERSPECTIVE (Mohan, Pearl & Tian 2013) • Pervasive in every experimental science.
• Huge literature, powerful software industry, deeply entrenched culture.
• Current practices are based on statistical characterization (Rubin, 1976) of a problem that is inherently causal.
• Needed: (1) theoretical guidance, (2) performance guarantees and (3) tests of assumptions.Managing missing data is a problem in every experimental science.Sensors do not always work reliably, respondents do not fill out every question in the questionnaire, and medical patients are often unable to recall episodes, treatments or outcomes.The literature on this problem is huge and has resulted in a powerful software industry that makes missing data packages available through computer programs such as LISREL, M-plus and EQS.The availability of such software has engendered a culture that shares vocabulary, beliefs and expectations and uses common theoretical framework and default assumptions.Most practices are based on the seminal theoretical work of Rubin (Rubin, 1976;Little & Rubin, 2002) who have formulated procedures and conditions under which the damage of missingness can be minimized.This theory has also resulted in a number of performance guarantees when data obey certain statistical conditions.However, the theoretical guarantees provided by this theory are rather coarse, as will be shown in the discussion that follows.

WHAT CAN CAUSAL THEORY DO FOR MISSING DATA?
Q-1.What should the world be like, for a given statistical procedure to produce the expected result?
Q-2. Can we tell from the postulated world whether any method can produce a bias-free result?How?
Q-3. Can we tell from data if the world does not work as postulated?
• To answer these questions, we need models of the world, i.e., process models.Figure 2 explicates the kind of guidance and guarantees that are needed in missing data research (Pearl (2013)).Question Q1 refers to a researcher who has acquired a statistical package that handles missing data and would like to ask what the structure of his/her problem should be like for the procedure to produce an estimate that is consistent.(In this note we will be dealing only with the question of bias(consistency), and will assume therefore that infinitely large sample is available and that the user is concerned primarily with convergence to the right answer, rather than speed of convergence.) Another question (Q2) that a user might ask is whether the problem at hand lends itself to solution by any method whatsoever.This is important because, if the answer is negative, then a biased result should definitely be expected with finite data and no software, however smart, can overcome this theoretical impediment.On the other hand, if the answer is affirmative, the user might next wish to ask whether another software can exploit the specific features of the problem so as to produce a consistent estimate.The third question (Q3) relates to testability, namely, once the user postulates a structure for the problem can the data tell us if the postulated structure is incorrect?
The first two questions address a problem we call "recoverability" and the third, a problem called "testability."To answer these questions reliably the user must articulate features of the problem in some formal language, preferably in a model that captures both the inter-relationships among the variables of interest as well as the missingness process, i.e., explaining why some values are missing.While the theory of Rubin and Little (Little & Rubin, 2002) captures some of these relationships, the language they used is not sufficiently refined to capture details of the missingness process and to specify, for example, which variable is responsible for values missing in another.The characterization that emerged from this theory is likewise rather crude.Specifically, it divides problems into three categories: Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR).In MAR for instance, missingness can only be explained by variables that are fully observed whereas those that are partially observed cannot be responsible for missingness in others; an unrealistic assumption in many cases.1 Performance guarantees and some testability results are available for MCAR and MAR, while the vast space of MNAR problems has remained relatively unexplored.The purpose of this note is to partition the space of MNAR problems along two orthogonal dimensions: (1) Recoverable vs non-recoverable and (2) Testable vs nontestable.Moreover, the partition will be query dependent.For example, in some problems we can obtain the consistent estimate of queries such as P (Y |X) and P (Y ) but not of P (X, Y ).With the tools that we will develop, the user will be able to examine the features of his/her problem, determine whether a given query is recoverable (i.e., estimated bias-free from any given dataset (with missing values), that the model is capable of generating) and whether the assumptions that lead to such recoverability have testable implications.
Estimate E(X) from partially observed data: True data: • The reasons for missingness can make a difference: • Missingness mechanism: R x = Model 1: is non-recoverable Figure 3 demonstrates the problem of determining recoverability and how missingness mechanisms affect that determination.Assume we observed the string X * where each m stands for absence of a value and our task is to estimate E(X) where X is a random variable whose actual values are shown in string X, below X * .We will first demonstrate that different assumptions about the missingness process lead to different conclusions about E(X).To articulate such assumptions we define a notion called "missingness mechanism" which stands for a binary variable R x that acts like a switch.When the switch is activated (R x = 1) we do not observe the value of X but rather the value X * = m.When the switch is not activated (R x = 0) we observe the correct value of X i.e.X * = X.Each variable of interest will be assumed to have a missingness mechanism that, in general, may be activated by X itself, as well as by other variables in the model.
The simplest such mechanism is shown in Model-1 in which R x is activated by a random coin and is shown to be unrelated to X. Formally, this independence is written: X⊥ ⊥R x and is implied by the absence of an arrow or any other graphical connection between Xand R x .Under this assumption, which falls under the category of MCAR, one can easily estimate the expected value of X.One need only examine the observed (unmasked) values of X * and take their expectation.In our finite example the answer will be X = 3 5 but when the number of samples increases the sample estimate will converge to E(X).Let us now examine Model-2, in which a direct arrow is drawn from X to R x .In other words, whether or not missingness is activated depends on the value of X.Such dependence, written X ⊥ ⊥R x , may represent for example, a salary survey in which people with high (or low) income are reluctant to reveal their income.Naturally this model falls under the MNAR category.In this model, since we do not know the exact dependence of R x on X, and since the arrow X → R x permits any imaginable dependency to exist, we cannot estimate E(X) without bias, and we say that E(X) is non-recoverable.We can actually prove this assertion by showing two kinds of dependence between X and R x each yielding a different answer.In version-a of the model we assume an extreme case where X is missing only if its value is 1, and in version-b we assume that X is missing only if its value is 0. As we see, substituting 1 for m in version-a results in X = 7 9 while substituting 0 for m (in version-b) results in X = 2 9 .If two different versions, both permitted by the model, yield conflicting results, it must be that the query E(X) is non-recoverable; no algorithm in the world, however smart, can produce the correct answer E(X) = 0.4 without further assumptions, even assuming the model is correct.We can further see why recoverability is achievable in Model-1 and not in Model-2.Model-1 is more constrained than Model-2.While Model-2 permits any dependency between X and R x to be realized, Model-1 insists on total independence, X⊥ ⊥R x , which severely constrains the values that we can substitute for m as we try to reconstruct the distribution of X.

FROM RECOVERY TO TESTABILITY Model 1:
Model 2: In model 1 E(X) and P(X, R x ) are recoverable.In model 2 E(X) and P(X, R x ) are not recoverable.
Can we test the model?NO.Model 1 and 2 are indistinguishable.Any data generated by Model 2 can also be generated by Model 1.
Figure 4 addresses the question that naturally surfaces in the mind of every reader; if we have two models, one permitting recoverability and the other not, why can't we test, from data itself, which model is more likely to be true?It turns out that, to the disappointment of many, Model-1 and Model-2 are indistinguishable.This might come as a surprise to readers familiar with graphical models because, after all, Model-1 has a missing arrow which stands for the independence X⊥ ⊥R x , and independence is a property of the distribution that can be tested in data.Unfortunately, we are dealing with missing data; although R x is observed unambiguously in the data, X is not, since it is contaminated with m's.
To show explicitly that Model-1 can emulate any data produced by the less constrained Model-2, let us go back to Figure-3 and examine X * .It is always possible to replace the m's with one's and zero's stochastically, using the same distribution with which they appear in the non-missing values of X * .The result would be a string that satisfies the independence claim advertised by the missing arrow in Model-1.This demonstrates our claim that Model-1, despite being constrained by the missing arrow, is not falsifiable by any data in which some values of X are missing.Any such data can be construed as coming from Model-1.

Recoverability
Given a missingness model G and data D, when is a quantity Q estimable from D without bias?Non-recoverability Theoretical impediment to any estimation strategy Testability Given a model G, when does it have testable implications (refutable by some partially-observed data D' )?
What is known about Recoverability and Testability? Figure 5 summarizes our discussion formally and emphasizes the fact that nonrecoverability, if established in a given problem, constitutes an insurmountable impediment to any estimation strategy.Such finding should alert the user either to resign to the perils of biased results or to attempt to augment the model with auxiliary variables whose presence would alter the model structure and render the problem recoverable.Such strategy has indeed been proposed by many researchers (Collins et al. (2001); Graham (2003); Allison (2003); Enders (2010)).However, the question of which auxiliary variables are likely to increase or decrease bias is not well understood (Thoemmes & Rose, 2013) and is begging graphical analysis.

RECOVERABILITY AND TESTABILITY
The last three lines of Figure 5 tabulate what is currently known in the missing data literature about recoverability and testability.We know that MAR and MCAR are recoverable for all probabilistic queries.Little (1988)  The following slide illustrates what we mean by charted territories.It contains nine missingness models and a query P (X, Y ) and the task is to determine, from each given structures whether P (X, Y ) is recoverable.(Here, solid circles represent fully observed variables and hollow circles represent partially observed variables, which are always accompanied by their respective R variables in the graph).
The technique that we have developed (Mohan et al., 2013), permits one to inspect the graphs and decide whether recoverability holds or not, for any given query.According to our criteria, one can categorically state that P (X, Y ) is recoverable in models (b), (c), (e), (f), (h) and (i), and non-recoverable in models (a),(d) and (g).A heuristic explanation follows: In model (a), we see an arrow between Y and R y which presents the same ambiguity we encountered in Figure 3 Model-2, hence it is non-recoverable.
In model (i) the connection between Y and R y is intercepted by a fully observed variable Z.The pair X, Y becomes independent of the pair R x , R y which essentially turns the problem into MAR.Specifically, in every stratum of Z = z, the missingness of X, Y occurs completely at random.category.The vast majority of probabilities generated by a model which we classify as MNAR would also be classified as MNAR according to Rubin(1976).
Model (b) has similar features except that no fully observed variable is present.Instead, the missingness in Y occurs totally at random and, conditional on Y , X is independent of its missingness mechanism R x .This combination allows us to first recover P (Y ) from tuples in which Y is observed and then recover P (X|Y ) from tuples in which both X and Y are observed.
Almost identical behavior is exhibited by model (c) since the bi directed arrow R x < − − − > R y does not interfere with the recovery scheme just outlined.All independencies needed for this recovery scheme are authorized by the graph.
In model (e) we do not have a variable that is totally independent of its missingness mechanism.Instead, conditioning on Y will render X and R x independent, while conditioning on X will render Y and R y independent.Since both X and Y are partially unobserved it is not clear whether this structure will lend itself to recoverability.The theory nevertheless confirms recoverability of P (X, Y ) in this case and in many other so called "entangled" cases (Mohan et al., 2013).
Model (d) prohibits recoverability because to recover P (X, Y ) amounts to recovering both P (X) and P (X|Y ).However, conditioning on Y renders X and R x dependent which prohibits recoverability.Note that, in this example, P (X) and P (Y ) are recoverable.
Model (h) permits recovery.The graph informs us that X and Y are jointly independent of R x and R y .Indeed, if we marginalize over Z and R z , the remaining problem becomes MCAR.
Model (f) can be shown to permit recoverability by going through a sequence of conditionalizations as we did in (b).First we notice that Y is MCAR.Next we notice that conditioning on Y renders X and R x independent.This makes P(X,Y) recoverable.(We can continue and condition on X and Y , thus rendering Z and R z independent which leads to the recovery of P (X, Y, Z).) Next examine model (g).Attempting the same scheme as in (f) we find the following obstacle.Recovering P (Y, Z) is trivial but, now, we observe an inducing path between X and R x .Thus, conditioning on either Y or Z (or both) renders X and R x dependent.The result is that there is no sequence of nodes that separate X from R x .Hence P (X, Y, Z) is non recoverable in (g).
estimator, namely, a universal estimator that is driven by data alone and does not change its strategy as we move from structure to structure.If data are MAR or MCAR, this requirement is satisfied, which explains the popular use of model-blind estimators such as MI or EM under the assumption of MAR; the specific features of the problem at hand need not be attended to and the user is spared the effort of choosing and defending a specific graph structure.In areas (M) and (S) however, such effort is unavoidable; recoverability requires model smart estimators.We will actually prove this requirement in a following slide.
The popular estimator known as ML suffers from the same deficiency unless it is applied to the full model, namely, the substantive part and the missingness part.We conjecture that ML would recover the joint probability in areas (S) and (M) if conducted on the full missingness model.This is rarely done in practice.Users are rarely requested to specify the missingness part of the model.Consequently, ML users receive no warning when facing a problem in area N. Given our criterion for N, such a warning can easily be produced by mere inspection of the model structure.A more stringent warning should await MI users, since the danger may also be present in the minefield of areas (M) and (S).Even problems that are recoverable require knowledge of the model structure, without which biased results are likely to be produced.
The final result we report in this summary is that testability has been charted over the entire terrain of missing data problems.In other words, given a model structure, one can discern whether the model yields testable implications.Some of these tests are powerful enough to rule out MAR data and placing the problem in the MNAR category.Speaking about testability, it is important to note a peculiar phenomenon that takes place in missing data analysis.Consider the example illustrated in Figure 8.It is easily shown that the problem is MAR, hence P (X, Y, Z, R Z ) is recoverable.This is so because the fully observed variables X and Y "explain" the missingness in the third variable Z.Moreover, the recovered probability distribution embeds the conditional independence claim Z⊥ ⊥R z |(X, Y ), which would be testable had Z been fully observed.Unfortunately, the fact that Z is only partially observed prevents this independence claim from being testable in the available data.In other words, any data whatsoever with Z partially observed is compatible with the model above and, so, no such data can falsify the conditional independence Z⊥ ⊥R z |(X, Y ).
We shall now give a very simple syntactic criteria to determine if a conditional independence A⊥ ⊥B|C is testable with |A| = |B| = 1 where A, B and C are allowed to include not only variables but also missingness mechanisms.The syntactic rule states that a conditional independence is testable if it has one of the following forms: (It is understood that, if X or Y or Z are fully observed, the corresponding missingness mechanism may be removed from the conditioning set.Clearly, any conditional independence comprised exclusively of fully observed variables is testable.)This rule, combined with the fact that all conditional independencies claimed by a graph can be reduced to pair-wise independencies permits us to determine whether the model, as a whole, is testable.
To illustrate the power of this criterion note that if we remove the X → R z edge from the diagram in Figure 8, the model becomes testable, because X⊥ ⊥R z |Y complies with (2) above.This necessity is further strengthened in Figure 10, showing that, even if one settles on receiving a consistent estimate only when such exists, no universal algorithm exists that can offer such guarantee.Figure 10 shows a query that is recoverable by two indistinguishable models, yet each dictates a different procedure for recoverability, and a different estimand resulting from each procedure.If we run a procedure informed by M1 on M2 (or vice-versa), a biased estimate will ensue.This is precisely the case in region (S) of the pie-chart; models in this region permit recoverability but, at the same time, the estimates produced are sensitive to model structure.

MATHEMATICAL RESULT #4:
(Recoverability from missing data is (almost) solved) • The feasibility of recovering relations from missing data can be determined by graphical methods, provided the missingness mechanism is encoded (correctly) in a causal diagram.
• The same applies to testability of conditional independence claims.
• The results are complete with a possible exception of the uncharted area outside (M), (S), and (N).

Figure 5 :
Figure 5: Slide-63 Figure5summarizes our discussion formally and emphasizes the fact that nonrecoverability, if established in a given problem, constitutes an insurmountable impediment to any estimation strategy.Such finding should alert the user either to resign to the perils of biased results or to attempt to augment the model with auxiliary variables whose presence would alter the model structure and render the problem recoverable.Such strategy has indeed been proposed by many researchers(Collins et al. (2001);Graham (2003);Allison (2003);Enders (2010)).However, the question of which auxiliary variables are likely to increase or decrease bias is not well understood(Thoemmes & Rose, 2013) and is begging graphical analysis.The last three lines of Figure5tabulate what is currently known in the missing data literature about recoverability and testability.We know that MAR and MCAR are recoverable for all probabilistic queries.Little (1988);Lin & Bentler (2012)(2012)   has devised tests for refuting MCAR whilePotthoff et al. (2006) devised one for MAR.2The latter can be strengthened by methods based on graphical methods (Mo-