Sensitivity analysis of unmeasured confounding in causal inference based on exponential tilting and super learner

Causal inference under the potential outcome framework relies on the strongly ignorable treatment assumption. This assumption is usually questionable in observational studies, and the unmeasured confounding is one of the fundamental challenges in causal inference. To this end, we propose a new sensitivity analysis method to evaluate the impact of the unmeasured confounder by leveraging ideas of doubly robust estimators, the exponential tilt method, and the super learner algorithm. Compared to other existing methods of sensitivity analysis that parameterize the unmeasured confounder as a latent variable in the working models, the exponential tilting method does not impose any restrictions on the structure or models of the unmeasured confounders. In addition, in order to reduce the modeling bias of traditional parametric methods, we propose incorporating the super learner machine learning algorithm to perform nonparametric model estimation and the corresponding sensitivity analysis. Furthermore, most existing sensitivity analysis methods require multivariate sensitivity parameters, which make its choice difficult and subjective in practice. In comparison, the new method has a univariate sensitivity parameter with a nice and simple interpretation of log-odds ratios for binary outcomes, which makes its choice and the application of the new sensitivity analysis method very easy for practitioners.


Introduction
In order to explore causal inference, Neyman [19] proposed the potential outcome framework in the context of completely randomized experiments and later Rubin [29] extended it to both observational and experimental studies. Consider a binary treatment T with T = 1 indicating the treatment group and T = 0 indicating the control group. The potential outcome Y(t) is defined as the outcome we would observe if a subject had been assigned the treatment T = t, t = 0, 1. The observed outcome will then be Y = TY(1) + (1 − T)Y(0). The causal effect, or average treatment effect (ATE), of the treatment is defined as τ = μ 1 − μ 0 , where μ t = E{Y(t)}, t = 0, 1. In observational studies, a standard assumption for causal inference is the assumption of strongly ignorable treatment assignment.
Let X be a collection of measured baseline covariates which are considered possible confounders. The strong ignorability assumption states that the treatment assignment T is conditionally independent of the potential outcomes Y(0) and Y (1) given X, i.e. Y(t) ⊥ T|X, t = 0, 1. (1) In other words, within each level of X, T is assumed to be independent from the potential outcomes, so we have a quasi-randomized experiment. Researchers usually hope that sufficiently rich baseline information is collected so that the ignorability assumption (1) is reasonable. However, this assumption is usually questionable in observational studies, and the unmeasured confounding is one of the fundamental challenges in causal inference. If unmeasured confounders exist, the strong ignorability assumption is violated, which may result in a biased treatment effect estimation and undermine the validity and credibility of the corresponding causal inference. Unfortunately, the strong ignorability assumption cannot be validated with observed data for the estimation of causal effect, and there is frequently insufficient background knowledge to justify this assumption.
Without the strong ignorability assumption (1), the causal effect becomes unidentified, which has motivated alternative assumptions of various forms that aim to recover the identifiability. However, these alternative assumptions are also untestable with observed data like the strong ignorability assumption and may be as questionable as, or even more questionable than, the strong ignorability assumption. To address such uncertainty, it is important to conduct a sensitivity analysis that considers a variety of identifiability assumptions and compare results obtained under different assumptions.
The history of sensitivity analysis can be dated back to the work of Cornfield et al. [4] which explored a causal link between smoking and lung cancer. Rosenbaum and Rubin [24] proposed a sensitivity analysis method that explicitly adjusts for an unmeasured confounder as a latent variable. With U denoting the unmeasured confounder, their method assumes that X and U together satisfy assumption (1), i.e. Y(t) ⊥ T|X, U. In addition, their method depends on well-matched sets based on propensity scores by assuming that X is finitely discrete with both Y and U being binary. Later Imbens [12] extended Rosenbaum and Rubin [24] to allow for non-binary outcomes. As mentioned in Imbens [12], one of the key points is that a parametric model is postulated. More specifically, the observed data consists of (Y, T, X) and a binary unmeasured confounder U is assumed for each individual. Suppose that the observed data and the unmeasured confounder were generated according to: where Ber(π ) is a Bernoulli distribution with a success probability π , expit(s) = (1 + e −s ) −1 = e s /(1 + e s ). If we could observe U, the parameters (P u , γ x , γ u , β t , β x , β u ) can be estimated via logistic regression models and report τ as the causal effect. The two coefficients γ u and β u measure the strength of the association between the unmeasured confounder and the treatment and the association between the unmeasured confounder and the outcome, respectively. However, U is not observed, thus P u , γ u and β u are not identifiable using the observed data. What analysts can do is to specify plausible values of (P u , γ u , β u ) based on their subjective judgments about these parameters, then the other parameters can be estimated based on (P u , γ u , β u ). The final estimate of the causal effect can be expressed asτ (P u , γ u , β u ). Veitch and Zaveri [36] pointed out that this approach has a major drawback: it relies on a parametric model for the full data generating process. Using the assumed model is equivalent to assuming that if U had been observed, it would have been appropriate to use logistic regression to model the treatment assignment and outcome. Also, the sensitivity analysis result depends on the distribution of U. Furthermore, the choice of the sensitive parameters (P u , γ u , β u ) is subjective and challenging and could be even more difficult if U is not univariate. Lin et al. [15] proposed an approach to parameterize the unmeasured confounder as the bias of regression coefficients in an outcome regression model adjusting for both the measured and unmeasured confounders. There is also a line of work that formulated the sensitivity parameter as the bias of the outcome regression model caused by the unmeasured confounder, see, for example, Hogan et al. [11], Jung et al. [13], Roy et al. [28], etc. Ding and VanderWeele [6], Vander-Weele and Ding [35] investigated how strong an unmeasured confounder has to be in order to qualitatively change the conclusion of a causal analysis. In recent years, another extension of the traditional sensitivity analysis was from a Bayesian perspective which used the average over the distribution of sensitivity parameters rather than varying the sensitivity parameters [7,17,18]. For sensitivity analysis methods that require specifying some parametric models for the confounding variable, their required assumptions may be difficult to hold in practice. In addition, they usually contain multivariate sensitivity parameters, which makes their choice difficult and subjective when performing sensitivity analysis in practice.
To this end, we propose a new sensitivity analysis method to evaluate the impact of the unmeasured confounder by leveraging ideas of doubly robust estimators [22], the exponential tilt method [30], and the super learner machine learning method [33]. Note that if there is any unmeasured confounder, f (Y(t)|X, T = 1) and f (Y(t)|X, T = 0) will be different, where f is a probability distribution function (either a density function for a continuous variable or a probability mass function for a discrete variable). Inspired by this, we propose to assess the sensitivity of the difference between the conditional distribution of the observed potential outcome given covariates and that of the counterfactual potential outcome via the exponential tilting method proposed by Scharfstein et al. [30]. Compared to most existing sensitivity analysis methods, the exponential tilting method does not directly impose any assumptions on the distribution of the unmeasured confounders. Therefore, the new method has the flexibility to allow the unmeasured confounder to be continuous, binary, or categorical, and be univariate or multivariate. In order to reduce the modeling bias of traditional parametric methods, we propose incorporating super learner machine learning algorithms to perform the nonparametric model estimation and the corresponding sensitivity analysis. Super learner algorithms aim to optimally combine many machine learning algorithms together to provide a better estimation than any individual candidate machine learning algorithm. Unlike most of the existing sensitivity analysis methods which usually contain multiple sensitivity parameters, the new method has a univariate sensitivity parameter, which directly measures the bias of the strong ignorability assumption (1) caused by the unmeasured confounding, and has a nice and simple interpretation of log-odds ratios for binary outcomes. The used univariate sensitivity parameter also makes its choice and the application of the new sensitivity analysis method very easy for practitioners.
The rest of this paper is structured as follows. In Section 2, we introduce our proposed estimation method in detail. In Sections 3 and 4, we present numerical examples based on both a simulation study and a real data application. A conclusion is given in Section 5.

Methodology
In this article, we mainly focus on the binary response variable due to its wide variety of applications in causal inference [8,18,24]. The proposed method can be easily extended to a continuous response variable. Note that for a binary response variable, the potential outcome mean μ t = E{Y(t)} can be interpreted as the success rate among the target population if everyone had been assigned the treatment t. For simplicity of explanation, we introduce our new method by focusing on estimating μ 1 = E{Y(1)}, the mean of potential outcome if everyone in the target population had been treated. The μ 0 = E{Y(0)} and the corresponding ATE τ = μ 1 − μ 0 can be similarly estimated.
When the subjects are in the treatment group, Y(1) is the observed actual outcome, whereas when the subjects are in the control group, Y(1) becomes the counterfactual outcome which cannot be observed. So the estimation of μ 1 essentially boils down to the imputation of counterfactual outcomes. Let Then our target parameter can be also written as μ 1 = E{Y(1)} = E{m(X)}. Notice that when there is no unmeasured confounder, which can be estimated from the observed data.

Doubly robust estimator
be an independent and identically distributed random sample from X ∈ R d , Y ∈ {0, 1} and T ∈ {0, 1}, where X is a set of baseline covariates, and T is a binary treatment indicator. Under the strong ignorability assumption (i.e. there is no unmeasured confounder), three of the most commonly used estimators for μ 1 wherem 1 (x) andq(x) are the estimates of m 1 (x) and q(x), respectively. Estimator (3) only involves an outcome regression (OR) model, m 1 (x), so it is referred to as an OR estimator [32]. Estimator (4) is called an IPW estimator [16,21,23], since it includes a model for the propensity score [25,PS], q(x) = P(T = 1|X = x), and uses the inverse probability weight (IPW). Estimator (5) is a doubly robust estimator of the potential outcome mean [16,22,26,27,31,DR]. It involves both OR and PS models. Notice that the DR estimator is an augmented form of the IPW estimator: The consistency of the resulting estimators OR, IPW, and DR depends on the correct specification of the relevant models for q(x) and m 1 (x). However, unless the parametric models for q(x) and m 1 (x) are correctly specified, we cannot expect OR or IPW estimator to be consistent. It is worth noting that the validity of the DR estimator entails weaker conditions, since it only requires either m 1 (x) or q(x) to be correctly specified. Because of this, we will adopt the DR method for the proposed sensitivity analysis. If there exists any unmeasured confounder, then m(x) = m 1 (x) and P(x, y) = q(x), and the traditional estimators of μ 1 in (3)-(5) will be biased. To incorporate the unmeasured confounder to correct the estimation bias, we propose the following modified doubly robust estimator for μ 1 using the estimators of m(x) and P(x, y) instead of m 1 (x) and q(x), respectively,μ The difficulty of the estimation for m(x) and P(x, y) lies in the fact that the outcome Y(1) is only partially observed.

Exponential tilt method for the unmeasured confounder
When there are unmeasured confounders, the conditional distribution of the observed Y(1) is no longer the same as the conditional distribution of the unobserved Y(1) given the covariates, and thus the relationship in equation (2) breaks down. Our goal is to restore the relationship between the conditional distributions of the observed and the unobserved outcome so that we can further estimate m(x) and P(x, y) in (6). Let g(y|x) be the conditional distribution of Y(1)|X and g t (y|x) be the conditional distribution of Y(1)|X, T = t, t = 0, 1. Notice that when T = 1, Y(1) is observed, and when T = 0, Y(1) is missing. We propose leveraging the exponential tilt method from Scharfstein et al. [30] to build the connection between g 0 and g 1 , which links the distribution of unobserved outcomes to the the distribution of observed outcomes. The denominator in (7) is a normalization constant to make g 0 a legitimate density. Since the exponential tilt method (7) does not require specifying parametric models for latent confounding variables, the unmeasured confounder could be continuous, binary, or categorical, and could be univariate or multivariate. The parameter α is a univariate sensitivity parameter and non-identifiable with the data and governs the departure of the truth from the strong ignorability assumption that 'no unmeasured confounder exists '. When α = 0, g 0 (y|x) = g 1 (y|x), which indicates there is no unmeasured confounding. When α = 0, g 0 (y|x) = g 1 (y|x) and hence some unmeasured confounders exist. It can be derived from (7) that for a binary outcome, if Equation (8) also indicates that where logit(s) = log s 1−s . Therefore, for binary response variables, e α is the conditional odds ratio (and hence α is the log odds ratio) of the unobserved potential outcome and the observed potential outcome being 1 after adjusting for the measured confounding variables X; the sensitivity parameter α measures the violation of the strong ignorability assumption (1) caused by the unmeasured confounding. Figure 1 demonstrates how the conditional log-odds ratio relates to the underlying probabilities of the unobserved and observed potential outcome being 1 as the sensitivity parameter α varying from −5 to 5. When there is no unmeasured confounder, and thus the log odds ratio is α = 0. If α > 0, it implies that the probability of the unobserved potential outcome Y(1) (given T = 0) being 1 is larger than that of the observed potential outcome (given T = 1) being 1 after adjusting for the measured confounding variables X. When α < 0, the relationship is reversed, i.e. the probability of the unobserved potential outcome Y(1) (given T = 0) being 1 is smaller than that of the observed potential outcome (given T = 1) being 1 conditional on any X.
For example, if Y = 1 indicates a certain disease being cured, then e α is the conditional odds ratio of being cured between the control patients had they been treated and the treated patient after adjusting for the measured covariates. A value of α = 1 implies an odds ratio of e 1 = 2.72, meaning that the odds of being cured for the control patients had they been treated is almost 2.72 times as that of the treated patients after adjusting for measured covariates. The choice of α in practice relies on some subject-matter guidance, such as experts' experience and prior knowledge that is experiment-specific. In practice, usually, the α value from −2 to 2 (with the corresponding odds ratio from e −2 = 0.14 to e 2 = 7.39) Figure 1. The contour plot of log-odds ratio or even −1 to 1 (with the odds ratio from 0.37 to 2.72) is a reasonable choice for a sensitivity analysis.
As mentioned earlier, the exponential tilt method of (7) can be also applied to the continuous outcomes easily. For example, based on the assumption of (7) for the unobserved outcome variable. Note that σ can be estimated based on the observed outcome. If the exponential tilt method is applied to the standardized data (i.e. σ = 1), then the sensitivity parameter α is exactly equal to the mean difference/shift between the unobserved potential outcome from the control arm and the observed potential outcome from the treatment arm.
To apply the proposed method (6), we need to estimate m(x) and P(x, y). Note that g 1 (y|x) and q(x) can be estimated from observed data. We introduce a super learner machine learning estimation method for estimating g 1 (y|x) and q(x) in Section 2.3. The m(x) can then be estimated bym(x) =ĝ(1|x), wherê and g 0 (y|x) can be estimated based on g 1 (y|x) according to equation (7). To estimate P(x, y), note that Then, P(x, y) can be estimated by plugging the estimates to equation (9) aŝ and the difference between the modified DR estimator (6) and the standard DR estimator (5) is the bias caused by unmeasured confounding.

Super learner machine learning estimators
One way to estimate g 1 (x) and q(x) is to specify some parametric models for them, for example, a linear or a logistic regression model for g 1 (x) for a continuous or binary response, respectively, and a logistic regression model for q(x). However, if there are some violations of the parametric assumptions, which is usually the case in practice, it would aggravate the bias in the causal inference even for a doubly robust estimator.
To this end, we propose employing a super learner machine learning algorithm to nonparametrically estimate g 1 (x) and q(x). Super learner is a general loss-based learning algorithm that was proposed and analyzed theoretically by Van der Laan et al. [33]. The algorithm optimally combines a library of machine learners by minimizing the cross-validation error and is aimed to estimate the regression function flexibly without over-fitting the data [34]. To illustrate this learning process, let the observed data be where Y is the outcome and X is a set of covariates with dimension d. The super learner algorithm aims to estimate m(x) = E(Y|X = x) using a library of machine learners m 1 , . . . m K weighted by λ 1 , . . . , λ K , such thatm(x) = K k=1 λ kmk (X), wherem k is the estimator of m(x) based on k th machine learner. The selection of a library of machine learners will be discussed in Section 3. The weight vector λ = (λ 1 , . . . , λ K ) can be chosen by the following cross-validation procedure: Randomly split the sample (Y i , X i ) n i=1 into J equally sized subsets. For each j ∈ (1, . . . , J), the j th subset, denoted by S j , is used as a validation set and the other subsets are used as the training sets. Letm (−j) k be the estimator of m(x) using the k th machine learner based on the training data without the j th subset S j . Then we can find the weight vector by Polley et al. [20] suggested bounding λ k and using the constraints K k=1 λ k = 1, λ k ≥ 0, ∀k. A non-negative binomial likelihood maximization or maximizing AUC (Area Under The Curve) of ROC (Receiver Operating Characteristics) curve can also be used as a crossvalidation criteria for binary outcomes.

New sensitivity analysis method
Below we summarize our proposed sensitivity analysis by combining the ideas of the doubly robust estimator, the exponential tilt method, and the super learner machine learning method introduced in Sections 2.1 to 2.3, respectively.
Step 1: Train the super learner method for q(x) based on the data (T i , X i ) using T as the response variable and X as the independent variable to obtain the estimateq(x).
Step 2: Train the super learner method for g 1 (x) based on the subset of the data {(X i , Y i ), T i = 1, i = 1, . . . , n} using Y as the response variable and X as the independent variable to obtain the estimateĝ 1 (x).
Step 4: Repeat steps 1-3 for a set of the sensitivity parameter α and compare results across different α values, say from −2 to 2.
Note that the new sensitivity analysis method only has one sensitive parameter α, which also enjoys a nice interpretation based on the odds ratio. Therefore, unlike most existing sensitivity analysis methods containing multiple sensitivity parameters, the new method is much easier to implement and choose the sensitivity parameter.

Examples
In this section, we conduct two sets of simulation studies. Similar to Lin et al. [15], we use the first simulation study to demonstrate the effectiveness of the proposed super learner based doubly robust estimation method (6) for adjusting unmeasured confounders in the estimation of μ 1 for any given sensitivity parameter α. The second simulation study is conducted to illustrate the performance of the proposed sensitivity analysis method in Section 2.4 by varying the sensitivity parameter. We compare our proposed nonparametric doubly robust estimator using the super learner algorithm (DR_np) with two parametric estimators: OR and IPW as shown in equations (3) and (4), respectively. We use parametric logistic regression to estimate the outcome and the propensity score models for OR and IPW estimators. For the proposed DR-np method, the super learner is based on the library of learners of generalized linear models (GLM), generalized additive models [9, GAM] and recursive partitioning and regression trees [2, rpart].

Estimating µ 1
Given three independent baseline covariates X = (X 1 , X 2 , X 3 ) generated from a standard normal distribution, we generate the treatment assignment T and the observable binary outcome Y(1) in the treatment group (T = 1) from the following two logistic model: Note that the models used to generate both the treatment and the outcome include non-linear terms in X. When unmeasured confounding exists, the strong ignorability assumption (1) is violated and hence logit{P(Y(1) = 1|X, T = 1)} will be different from logit{P(Y(1) = 1|X, T = 0)} with the difference modeled by the exponential tilt relationship (8). The sensitivity parameter α in the exponential model can measure the departure of the truth from the assumption (1) regardless of the distribution and dimension of the unmeasured confounder. We examine α = (±2, ±1.5, ±1, ±0.5) for estimating μ 1 (the corresponding odds ratio is from 0.14 to 7.39). The simulation was performed for the sample size n = 1000 with 1000 replicates. Figure 2 displays squared errors of estimates for μ 1 based on the exponential tilt model (8) for each α = (±2, ±1.5, ±1, ±0.5). It can be seen that our proposed method DR_np can estimate μ 1 well after adjusting the confounding effect using the exponential tilt model and result in estimates with the smallest median squared error and the smallest variation, with the OR method a close second. In addition, both DR_np and OR have much better performance than IPW.

Sensitivity analysis
In the previous section, we have demonstrated that the proposed method can successfully adjust the unmeasured confounder for any given sensitivity parameter α. In practice, however, the α is unknown. Next we demonstrate the proposed sensitivity analysis method by checking how the estimate changes when varying the sensitivity parameter α for the exponential tilt model.
To illustrate how our new sensitivity analysis method can be applied to unmeasured confounding settings used by existing sensitivity analysis methods, we incorporate unmeasured confounding by explicitly including a latent variable U ∼ N(1, 1) as one of the covariates in the models to generate the treatment T and the outcome Y(t) as the following: The observed outcome is Y = TY(1) + (1 − T)Y(0), and the data set used for analysis is {Y, T, X}. We assume β t u = β y u = β u and consider four cases of β u = (0, 1, 2, 3) to represent different strengths of unmeasured confounding in the simulation. For each fixed β u = (0, 1, 2, 3), we perform a sensitivity analysis by checking how the estimate of μ 1 changes when α varies. The simulation is performed for the sample size n = 1000 with 1000 replicates.
Notice that case 1 with β u = 0 indicates 'there is no unmeasured confounder', which corresponds to α = 0. However, for other nonzero β u , it is hard to derive the explicit relationship between β u and α. As mentioned earlier, a better specified range of α usually depend on subject-matter knowledge. In this simulation, to better understand the impact of choices of the sensitivity parameter, we calculate the estimated log-odds of the control group and compare it to the estimated log-odds of the treatment group. We use this comparison to assess the plausibility of the specified α values, and then we span the range of α based on this difference for each case. One should notice that this estimated difference between the log-odds is not equal to the true value of α; it is only be used as a guidance to specify a plausible range of α for the sensitivity analysis. Based on our simulation, this difference for the three cases where β u = (1, 2, 3) is approximately 0.57, −0.5 and −1.59, respectively. Thus for case 1 and 2 (β u = 0 and 1), we examine α values of the sequence from −2 to 2 with an increment of 0.5. For case 3 and 4 (β u = 2 and 3), the sensitivity analysis are based on α = (−2.5, −2, −1.5, −1 − 0.5, 0, 1, 1.5) and α = (−3, −2.5, −2, −1.5, −1, −0.5, 0, 0.5, 1), respectively. The sensitivity parameter α used in our proposed method can measure the difference between the conditional log-odds of the unobserved outcome being 1 and the log-odds of the observed outcome being 1. In other words, α directly governs the difference between logit{P(Y(t) = 1|X, T = 0)} and logit{P(Y(t) = 1|X, T = 1)} caused by the unmeasured confounding U with sensitivity parameters (β t u , β y u ) and the distribution for U. Therefore, compared to the sensitivity parameters (β t u , β y u ) and the distribution for U used by existing sensitivity analysis methods, the univariate sensitivity parameter α used by our new method directly accounts for all possible unmeasured confounding and is also much easier to choose in practice.
Figures 3 display box plots of the estimates for μ 1 obtained by OR, IPW and the proposed method DR_np after adjusting the unmeasured confounding using the exponential tilt method with varying sensitivity parameter α for β u = 0, 1, 2, and 3, respectively. The dashed line indicates the true μ 1 . It can be seen that, at most specified α values, the proposed DR_np provides best estimates for μ 1 . For case 2, β u = 1 represents a mild unmeasured confounding, while the best estimate appears at α = 0.5, the estimate at α = 0 is also quite close to true μ 1 . This makes sense, since a mild unmeasured confounding would not drastically change the causal estimates. In another word, if we ignore a mild unmeasured confounding, it would not result in a big bias in the estimate. However, as the magnitude of β u increases, the departure between the estimate and the true μ 1 also tends to increase. But the departure of DR_np remains the smallest at most α values in each case, which demonstrates that our proposed estimator is more resistant to unmeasured confounders.

Effect of low EF on heart failure death
In this section, we discuss an application of the proposed sensitivity analysis to evaluate the causal relationship between heart failure death rate and low ejection fraction (EF ≤ 30). The ejection fraction (EF) measures how much blood the left ventricle pumps out with each contraction, which is usually represented as a percentage with a normal range between 50% and 75%. As stated by the World Health Organization Cardiovascular diseases (CVDs) fact sheet, CVD causes 31% of all global deaths and is the number one cause of death. We use the data set which was originally analyzed by Ahmad et al. [1]. The main objective of their study was to estimate death rates due to heart failure and to investigate its link with some major risk factors in the city of Faisalabad (the third most populous city of Pakistan). The data set contains medical records of 299 heart failure patients in Faisalabad from April to December 2015. Ten confounding variables are addressed in this study and summarized Our goal is to estimate the causal effect of low EF on heart failure deaths and to assess the sensitivity of the result to some unmeasured confounder. The outcome is defined based on whether heart failure death had occurred: Y = 1, if the patient died and Y = 0, otherwise. A summary of the death event variable Y is also provided in Table 1. Then, the causal effect of low EF is . We use our proposed modified doubly robust estimator to estimate μ 1 and μ 0 , and include GLM, GAM, rpart and random forest [10] algorithms in the super learner library. With the adjustment of these 10 measured confounding variables, the estimated effect of low EF on heart failure death τ is 0.275. Since the factors that lead to heart failure deaths remain unclear, it is informative and worthwhile to evaluate the sensitivity of the result of this study. For the potential outcome if everyone in this study had EF ≤ 30, we  Table 2 displays the point estimates for the low EF effect on heart failure death rate τ with the adjustment of the unmeasured confounder represented by α. The point estimates of τ change from approximately 0.08 to 0.37, while α being varied from −4 to 4. None of the α values makes the point estimate of τ null value zero (which would indicate that low EF would have no effect on heart failure death) or negative (which would indicate that low EF would decrease the heart failure death compared to higher EF). The results of this sensitivity study demonstrate that the potential unmeasured confounder will not likely change the sign of τ (except for extreme α values). Therefore, we can safely conclude that the low EF increases the heart failure death rate when compared to higher EF and such conclusion is robust against the possible unmeasured confounders.

NSW on post-intervention annual income
The National Supported Work (NSW) was a labor training program conducted in the 1970s. This program provided work experience to selected participants. The study measured 10 baseline covariates summarized in Table 3. The continuous variable, 1978 earnings, is the post-intervention annual income. The effect of the NSW program on post-intervention annual income levels was originally studied by LaLonde [14] and has been analyzed by various studies ever since, such as Imbens [12], Dehejia and Wahba [5], Carnegie et al. [3]. The data analyzed here is the same as in Dehejia and Wahba [5]. In this article, we focus on the the rate of participants who had an increase in the annual earning after the program in 1978. So we define our outcome variable y to indicate if there is an increase in the annual income by comparing 1978 earning to 1975 earnings. In another word, y = 1 indicating the participant's 1978 earnings is more than his/her 1975 earnings and y = 0 otherwise.
We use this data set to demonstrate how our proposed sensitivity procedure can address the impact of unmeasured confounding on the effect of NSW on the proportion of participants who had an increase in 1978 earnings based on our proposed modified DR estimator. Let T = 1 indicate enrollment in NSW and T = 0 otherwise. Let X be the vector of all measured confounders described in Table 3. Based on the assumption of no unmeasured confounding, the estimated causal effect of NSW (τ ) is 0.092, which implies that if everyone has participated in NSW, the NSW program increases the proportion of participants with increased post-intervention income by 9.2%. However, as mentioned in Imbens [12], strong motivation to enroll in a job-training program may lead to more favorable outcomes. Thus motivation to join the program can definitely be considered unmeasured confounding and would be worthwhile to address using the sensitivity analysis. For the potential outcomes if everyone had enrolled in NSW, we use the exponential tilt method of (7) to assume logit{Y(1) = 1|T = 0, X} − logit{Y(1) = 1|T = 1, X} = α 1 and α 1 represents the difference between the conditional log-odds of that 1978 earnings increased for nonparticipants of NSW if they had been enrolled in NSW and that of those who were actually enrolled in the NSW program. Similarly, for the potential outcome whether there is an increase in post-intervention outcome if everyone were not enrolled, we assume logit{Y(0) = 1|T = 1, x} − logit{Y(0) = 1|T = 0, x} = α 0 , where α 0 can be interpreted similarly to α 1 . For simplicity, we assume α 1 = α 0 = α. Table 4 displays the point estimates for the NSW effect on the ratio of participants whose post-intervention income has increased (τ ) with the adjustment of potential unmeasured confounding represented by α.
The point estimates of τ change from approximately −0.02 to 0.11. When α = −4, the sign of the point estimate reversed. While unmeasured confounders represented by α = −4 could reverse the sign of our causal estimates, it should be pointed out that α = −4 indicates extremely strong unmeasured confounding with a corresponding odds ratio 0.0018. If the investigators have made efforts to collect the measured confounders, it might not be very likely to have unmeasured confounders that result in an |α| > 2 or even |α| > 1, although those values were examined in this sensitivity analysis. Based on the above results, we could conclude that such unmeasured confounder would only mildly affect the cause effect of the NSW program on the ratio of participants whose post-intervention annual earning increased.

Conclusion
We have proposed a new sensitivity analysis method for causal inference to adjust for unmeasured confounding in the estimation of the mean outcome by combining the ideas of the doubly robust estimator, the exponential tilting method, and the super learner algorithm. In causal inference, when unmeasured confounders exist, the conditional distribution of the observed outcome is different from that of the unobserved outcome given the covariates. We use the exponential tilting assumption to link these two conditional distributions together directly with a univariate sensitivity parameter. This sensitivity parameter addresses the departure from the assumption of no unmeasured confounder. Compared to most of the existing sensitivity analysis in the literature, our method does not require modeling assumptions for the unmeasured confounders as latent variables and hence the unmeasured confounder could be continuous, binary, or categorical and could be univariate or multivariate. In addition, the sensitivity parameter can be interpreted as a log-odds ratio for a binary outcome, which makes the choice of its range relatively easy for practitioners. To reduce the bias of traditional parametric methods, we propose a nonparametric doubly robust estimator by incorporating super learner algorithms. The simulation studies demonstrate the effectiveness of the proposed method and its superiority to some other existing methods.