Collaborative Targeted Maximum Likelihood Estimation
- Author(s): Gruber, Susan
- Advisor(s): van der Laan, Mark J
- et al.
Collaborative targeted maximum likelihood estimation is an extension to targeted maximum likelihood estimation (TMLE), first introduced by van der Laan and Rubin in 2006. TMLE is an efficient, double robust, semi-parametric methodology for estimating a pathwise differential parameter of a statistical distribution given censored data. The TMLE procedure involves a parametric fluctuation of an initial estimate of the relevant factor of the density of the observed data, Q, that involves estimating the nuisance portion of the likelihood---censoring mechanism, g. DR estimators are consistent when at least one of these is estimated consistently. Though the best approach to nuisance parameter estimation is a current topic of debate in the literature, methods typically rely on maximizing a likelihood for g, however, by establishing the collaborative double robustness of the efficient influence curve, van der Laan and Gruber (2010) provides a theoretical justification for moving away from the practice of external nuisance parameter estimation. That paper also presents the collaborative TMLE (C-TMLE), and provides an algorithm for constructing the estimator.
This dissertation explores collaborative double robustness to provide an understanding of the requirements for effective nuisance parameter estimation that are the foundation of C-TMLE, and presents the forward-selection C-TMLE algorithm. This approach to targeted maximum likelihood estimation is especially useful when the true censoring mechanism is unknown, and adjusting for a large number of correlated confounders leads to highly variable estimates. It is also particularly effective when there is sparsity in the data that renders a statistical parameter borderline identifiable. Variations on the C-TMLE estimator are presented, and their behavior is compared and contrasted with several TMLEs and other estimators in the literature using simulated and real data. There is an emphasis on finite sample performance under sparsity and model misspecification, and the incorporation of data-adaptive techniques while still preserving influence-curve based inference. A method for ensuring that TMLEs respect known global constraints on the model is presented, and a guide to the R package tmle, for TMLE estimation of binary point treatment effects is provided in the Appendix.