Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Targeted Minimum Loss Based Estimation for Longitudinal Data

Abstract

Sequential Randomized Controlled Trials (SRCTs) are rapidly becoming essential tools in the search for optimized treatment regimes in ongoing treatment settings. Analyzing data for multiple time-point treatments with a view toward optimal treatment regimes is of interest in many types of afflictions: HIV infection, Attention Deficit Hyperactivity Disorder in children, leukemia, prostate cancer, renal failure, and many others. Methods for analyzing data from SRCTs exist but they are either inefficient or suffer from the drawbacks of estimating equation methodology. This dissertation describes the development of a general methodology for estimating parameters that would typically be of interest both in SRCTs and in observational studies which are longitudinal in nature, and have multiple time-point exposures or treatments. It is expected in such contexts that time-dependant confounding is either present (observational studies) or actually designed in as part of a study (SRCTs). The method, targeted minimum loss based estimation (TMLE), has been fully developed and implemented in point treatment settings and for various outcome types, including time to event outcomes, and binary and continuous outcomes. Here we develop and implement TMLE in the longitudinal setting, and pay special attention to dynamic treatments or exposures, as might be seen in SRCTs. Dynamic exposures are not limited to SRCTs however. The idea of a rule-based intervention turns out be a very fruitful one when one faces complex treatment or exposure patterns, or when one encounters challenges in defining an intervention that must depend on time-varying factors. As in the former settings, the TMLE procedure is targeted toward a pre-specified parameter of the distribution of the observed data, and thereby achieves important bias reduction over non-targeted procedures in estimation of that parameter. As with the so-called Augmented Inverse Probability of Censoring Weight (A-IPCW) estimator, TMLE is double-robust and locally efficient. We develop some of the background involving the causal and statistical models and report the results of several simulation studies under various data-generating distributions and for two outcome types (binary, and continuous on [0,1]). In our results we include comparisons from a number of other estimators in current use.

Chapter 1 develops the background and context in which this estimator appears, gives a brief history of other estimators used in SRCTs and describes some of the theory behind TMLE in the longitudinal setting. Two different TMLE algorithms are described in detail, and results of a simulation study for three separate causal parameters are presented.

Chapter 2 concerns the development of a new TMLE that solves the efficient influence curve estimating equation directly by numerical methods, rather than indirectly, which is the usual procedure. A new set of simulations is performed here that compare this TMLE with the preceding two (presented in chapter 1). Its performance is comparable to those described in chapter 1, but it is somewhat easier to implement.

Chapter 3 is a comparison of still another new TMLE (described in van der Laan and Gruber, 2012) with one of the three described above. This TMLE arguably shows the most promise generally, since it's implementation does not require discretization of the intermediate factors of the likelihood as does the three preceding TMLEs. Further, under the right conditions it exhibits superior performance in terms of MSE. We also explore a new, targeted criterion for selecting the initial estimators involved.

Chapter 4 describes a detailed analysis of the estimation of the effect of gestational weight gain on women's long term BMI using the preferred TMLE described in chapter 3. Many issues were encountered during this analysis concerning censoring of the exposure variable that led to the redefinition of the parameter of interest, and the implementation of a different type of TMLE for the first time (described originally in van der Laan, 2008). We also encountered issues arising from sparsity in the data and propose and implement corresponding solutions. The analysis was performed using data from the national longitudinal survey of youth, begun in 1979 and ending in 2008.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View