Targeted Learning in Estimating Heterogeneous Effects and Transporting Direct and Indirect Effects
Targeted learning offers a framework for applying state-of-the-art machine learning in computing estimates, while providing reliable measures of uncertainty for non-parametric and semi-parametric models. Often when applying data adaptive estimation necessary for accurate prediction to reduce bias we lose the ability to bootstrap non-parametrically for inference. This is where targeted maximum likelihood estimators succeed in providing valid inference under conditions we detail, through very inexpensive computation of the standard deviation of the efficient influence curve approximation. We apply the framework mainly to three new parameters of interest, particularly relevant to the field of causal inference and heterogeneous response to treatment. The first two are the variance and cumulative distribution function of the stratum-specific treatment effect function (VTE and TE CDF). The third is transporting from one site to another treatment effects in the presence of an intermediate confounder as well as a mediator, known as stochastic direct and indirect effects (SDE and SIE). We mainly consider SDE and SIE defined by the data in that the stochastic intervention on the mediator is defined by an estimate of the mediator and intermediate confounder mechanisms. We also consider SDE and SIE for both a restricted and unrestricted model that are relevant in practice. We prove efficiency and robustness properties for all the estimators used in this paper as well as software and provide extensive simulations to verify the properties and compare performance with other estimators.\
This manuscript contains a generalized method of deriving efficient influence functions central to applying targeted learning for these parameters and others for large models, including for the fixed transported stochastic direct and indirect effects parameters for both restricted and unrestricted models, where the stochastic intervention is defined by the true mechanisms for the mediator and intermediate confounder. The method comes out of a tutorial, featuring the necessary tools of measure theory, integration, functional analysis and efficiency theory, which enables statisticians to embrace estimation for large models often realistic for practical scientific questions. Lastly, this paper implements a new way to perform the targeting in the TMLE process via the discovery of the canonical least favorable submodels (CLFM's) which, are one-dimensional submodels applicable for high dimensional parameters. CLFM's, used in this paper for estimating many points on the TE CDF, are not only fast but also hold promise for mitigating practical positivity violations. Finally, we employ an easily implementable CV-TMLE procedure, applied on real data for estimating the VTE, that we show retains the attractive properties of Zheng and van der Laan's original CV-TMLE formulation.