In epidemiology, researchers try to answer questions about exposures and their effects (associations) on a variety of outcomes of interest. Most times, the collected data comes from observational studies, meaning that the researcher did not control the exposure to which each subject under study was exposed, like it is done in clinical trials. Additionally, researches collect information on other variables which could act as potential confounders of exposure. Estimation of adjusted associations under these conditions, if not reliant on arbitrary and thus biased parametric models, suffers from the curse of dimensionality. This dissertation describes semi-parametric statistical approaches to address the correct estimation of the parameter of interest using targeted maximum likelihood estimation (TMLE) methodology, which optimally adapts estimates of the data-generating distribution for estimation of the association of interest. The process optimally relies on machine learning techniques and is a modification of the likelihood-based algorithm where the parameter is defined by the so-called G-computation formula.
Chapter 2 provides the estimation of direct effects, adjusting for the possible indirect effects through intermediate variables. TMLE is used, with the help of model selection using the SuperLearner algorithm, to obtain estimators for the direct effect. General methods on how to estimate the natural and controlled direct effects using TMLE controlling for the intermediate variables are implemented. These techniques are then used to examine the direct effect of maternal depression on cognitive and language development in 350 Mexican-American children in the CHAMACOS birth cohort study. Children of mothers with depressive symptoms scored significantly lower (-2:82 (p-value < 0:05) points in the Preschool Language Scale) on the expressive communication compared to those of non-depressed mothers after controlling for the intermediate effects of home environment and breastfeeding duration. Depression did not show a significant direct effect on auditory comprehension, mental, or psychomotor scores.
Chapter 3 present the use of TMLE and machine learning to estimate effects of organophosphate (OP) pesticides during infant stages of child growth. Many papers have been published about the adverse effects of in utero pesticide exposure and the effects on fetal growth. All the previous literature has used traditional analyses, while we implement a TMLE approach. The goal is to obtain estimates of the effects of exposure to OP pesticides not only in utero but later, when the child is exposed directly and how this affects its physical growth at different ages: 6, 12, 24 months, 3.5, and 5 years. Pesticides are widely used in the Salinas Valley, CA where the population under study resides. We identify several statistically significant negative effects of the exposure to OP pesticides on child's growth.
Chapter 4 presents the longitudinal analysis of the intervention effect through the use of machine learning techniques and G-computation, as well as TMLE. There are no available studies about the longitudinal effects of organophosphate (OP) pesticides on child growth measured by child weight. This is a first attempt to estimate the effects of continuous exposure to OP pesticides in children living in the agricultural region of the Salinas Valley, CA. Without a control group, we estimated the effects of an intervention where exposure was controlled and fixed to the lowest level possible, and compared the estimated child weights from this scenario with the actual weights at 3.5 years of age. We used an ad hoc, but still double robust, targeting step on the outcome distribution estimate in conjunction with simulation based on the G-computation formula. Our results show a negative effect of OP pesticide exposure on the mean child weight at 3.5 years, however none of them reached significance.
Chapter 5 concludes with a summary of the preceding chapters and a discussion of future research directions.