- Main

## Targeted Maximum Likelihood Estimation Techniques For Time To Event Data and The Implications Of Coarsening An Explanatory Variable Of Interest Via Dichotomization In The Context Of Causal Inference In Semi-parametric Models

- Author(s): Stitelman, Ori Michael
- Advisor(s): van der Laan, Mark J
- et al.

## Abstract

This dissertation focuses on three important issues in causal inference. The three chapters focus on the common theme of causal inference in semi-parametric models. The first two chapters focus on further developing targeted maximum likelihood estimation (TMLE) methods for particular situations in survival analysis. Chapter 1 presents the collaborative targeted maximum likelihood estimator (C-TMLE) for the treatment specific survival curve. This estimator improves upon commonly used estimators in survival analysis and is particularly necessary for analyzing observational studies, data that exhibits dependent censoring, or both. Chapter 2 presents two interesting parameters of interest for quantifying effect modification in time to event studies. It then presents the TMLE for estimating these parameters. The third chapter presents the implicit assumptions practitioners make when dichotomizing treatment/exposure variables when trying to asses the causal effect of those variables.

Chapter 1 - Current methods used to analyze time to event data either, rely on highly parametric assumptions which result in biased estimates of parameters which are purely chosen out of convenience, or are highly unstable because they ignore the global constraints of the true model. By using Targeted Maximum Likelihood Estimation (TMLE) one may consistently estimate parameters which directly answer the statistical question of interest. Targeted Maximum Likelihood Estimators are substitution estimators, which rely on estimating the underlying distribution. However, unlike other substitution estimators, the underlying distribution is estimated specifically to reduce bias in the estimate of the parameter of interest. We will present here an extension of TMLE for observational time to event data, the Collaborative Targeted Maximum Likelihood Estimator (C-TMLE) for the treatment specific survival curve. Through the use of a simulation study we will show that this method improves on commonly used methods in both robustness and efficiency. In fact, we will show that in certain situations the C-TMLE produces estimates whose mean square error is lower than the semi-parametric efficiency bound. We will also demonstrate that a semi-parametric efficient substituiton estimator (TMLE) outperforms a semi-parametric efficient non-substitution estimator (the Augmented Inverse Probability Weighted estimator) in sparse data situations. Lastly, we will show that the bootstrap is able to produce valid 95 percent confidence intervals in sparse data situations, while influence curve based inference breaks down.

Chapter 2 -The Cox proportional hazards model or its discrete time analogue, the logistic failure time model, posit highly restrictive parametric models and attempt to estimate parameters which are specific to the model proposed. These methods are typically implemented when assessing effect modification in survival analyses despite their flaws. The targeted maximum likelihood estimation (TMLE) methodology is more robust than the methods typically implemented and allows practitioners to estimate parameters that directly answer the question of interest. TMLE will be used in this chapter to estimate two newly proposed parameters of interest that quantify effect modification in the time to event setting. These methods are then applied to the emph{Tshepo} study, to assess if either gender or baseline CD4 level modify the effect of two cART therapies of interest, efavirenz (EFV) and nevirapine (NVP), on the progression of HIV. The results show that women tend to have more favorable outcomes using EFV while males tend to have more favorable outcomes with NVP. Furthermore, EFV tends to be favorable compared to NVP for individuals at high CD4 levels.

Chapter 3 - It is common in analyses designed to estimate the causal effect of a continuous exposure/treatment to dichotomize the variable of interest. By dichotomizing the variable and assessing the causal effect of the newly fabricated variable practitioners are implicitly making assumptions, though typically these assumptions are ignored in the interpretation of the resulting estimates. In this chapter we formally address what assumptions are made by dichotomizing variables to assess the semi-parametrically adjusted associations of these constructed binary variables and an outcome. Two assumptions are presented, either of which must be met, in order for the estimates of the causal effects to be unbiased estimates of the parameters of interest. Those assumptions are titled the Mechanism Equivalence and Effect Equivalence assumptions. Furthermore, we quantify the bias induced when these assumptions are violated. Lastly, we present an analysis of a Malaria study that exemplifies the danger of naively dichotomizing a continuous variable to assess a causal effect.