The popularity of machine learning in both academia and industry has experienced unparalleled growth.This has been driven by many factors, including the proliferation and availability of digitized data, the recent growth of computational power available, such as graphical processing units, and the powerful machine learning software libraries that leverage them.
The overwhelming majority of existing and current research focuses on learning correlations between data rather than leveraging cause-effect relationships.
In parallel to the machine learning revolution, the study of cause-effect relationships, causality has been well-studied but often overlooked in current practice. These two disciplines are often accepted as orthogonal approaches to data modeling. This dissertation focuses on the confluence of these two approaches in an attempt to advance current machine learning techniques with fundamental concepts from causality. Namely, we identify several strategies to leverage causal structure (in the form of a directed acyclic graph) to improve machine learning performance.
Our technical contributions touch several fundamental and widespread machine learning problems. We first present a regularization method, called CASTLE (Causal Structure Learning), that simultaneously learns the causal graph/structure in the input layers of a neural network, allowing for improved predictive performance on out-of-sample data. Next, using a similar method, we develop a method for missing data imputation called MIRACLE (Missing data Imputation Refinement and Causal Learning). Similar to CASTLE, MIRACLE simultaneously learns the underlying causal structure to improve missing data imputation by refining its predictions in a unique ``bootstrapping'' manner. Next, we introduce a method, DECAF (Debiasing Causal Fairness), that introduces causal structure into Generative Adversarial Networks (GANs) to generate synthetic data that is fair for any downstream model. Next, we focus on the problem of unsupervised domain adaptation (UDA), where we leverage the invariance of causal structure to select models that best generalize to an unlabeled target domain. Lastly, we focus on extending our model selection method to individualized treatment effect (ITE) models, which are commonly used in the healthcare setting.
To demonstrate the utility of our models, we evaluate their performance on a variety of synthetic datasets, semi-synthetic datasets (for ITE models), and real-world datasets that include publicly available UCI datasets and healthcare datasets for heart failure, COVID-19, and prostate cancer among many others. We show that, compared to existing machine learning models that are agnostic to causality, our causally-aware models can improve regularization, missing data imputation, synthetic data quality, and UDA model selection.