This dissertation consists of three chapters that study treatment effect estimation and treatment choice learning under the potential outcome framework (Neyman, 1923; Rubin, 1974). The first two chapters study how to efficiently combine an experimental sample with an auxiliary observational sample when estimating treatment effects. In chapter 1, I derive a new semiparametric efficiency bound under the two-sample setup for estimating ATE and other functions of the average potential outcomes. The efficiency bound for estimating ATE with an experimental sample alone is derived in Hahn (1998) and has since become an important reference point for studies that aim at improving the ATE estimation. This chapter answers how an auxiliary sample containing only observable characteristics (covariates, or features) can lower this efficiency bound. The newly obtained bound has an intuitive expression and shows that the (maximum possible) amount of variance reduction depends positively on two factors: 1) the size of the auxiliary sample, and 2) how well the covariates predict the individual treatment effect. The latter naturally motivates having high dimensional covariates and the adoption of modern machine learning methods to avoid over-fitting.
In chapter 2, under the same setup, I propose a two-stage machine learning (ML) imputation estimator that achieves the efficiency bound derived in chapter 1, so that no other regular estimators for ATE can have lower asymptotic variance in the same setting. This estimator involves two steps. In the first step, conditional average potential outcome functions are estimated nonparametrically via ML, which are then used to impute the unobserved potential outcomes for every unit in both samples. In the second step, the imputed potential outcomes are aggregated together in a robust way to produce the final estimate. Adopting the cross-fitting technique proposed in Chernozhukov et al. (2018), our two-step estimator can use a wide range of supervised ML tools in its first step, while maintaining valid inference to construct confidence intervals and perform hypothesis tests. In fact, any method that estimates the relevant conditional mean functions consistently in square norm, with no rate requirement, will lead to efficiency through the proposed two-step procedure. I also show that cross-fitting is not necessary when the first step is implemented via LASSO or post-LASSO. Furthermore, our estimator is robust in the sense that it remains consistent and root n normal (no longer efficient) even if the first step estimators are inconsistent.
Chapter 3 (coauthored with Kirill Ponomarev) studies model selection in treatment choice learning. When treatment effects are heterogeneous, a decision maker, given either experiment or quasi-experiment data, can attempt to find a policy function that maps observable characteristics to treatment choices, aiming at maximizing utilitarian welfare. When doing so, one often has to pick a constrained class of functions as candidates for the policy function. The choice of this function class poses a model selection problem. Following Mbakop and Tabord-Meehan (2021) we propose a policy learning algorithm that incorporates data-driven model selection. Our method also leverages doubly robust estimation (Athey and Wager, 2021) so that it could retain the optimal root n rate in expected regret in general setups including quasi-experiments where propensity scores are unknown. We also refined some related results in the literature and derived a new finite sample lower bound on expected regret to show that the root n rate is indeed optimal.