Computational Considerations for Targeted Learning
Targeted Learning represents a principled methodology that has the potential to leverage the availability of big datasets and large scale computing facilities. However, many of the methods are computationally demanding, and therefore require careful consideration as to their implementation. This thesis comprises three cases studies at the intersection between Targeted Learning and computation. Chapter 1 describes the Targeted Bootstrap, a novel bootstrap technique that samples from a TMLE distribution and therefore has asymptotic performance guarantees, while avoiding issues related to cross-validation on bootstrap samples. Chapter 2 considers the problem of estimating both a target parameter and nuisance parameter on which it depends, when ideally both would be estimated with cross-validation. By carefully considering what parts of the sample are used for what estimation tasks, nested cross-validation can be avoided at great computational savings. This is achieved using the novel SplitSequential cross-validation approach. Chapter 3 describes the opttx package for learning optimal treatment rules. This package contains an implementation of SplitSequential Super Learner, and also contains a novel approach to learning an optimal rule for a categorical treatment variable. Further, performance-based variable importance measures are used to evaluate which of the covariates are most useful for making treatment decisions.