Propensity score matching (PSM) is a statistical technique which is widely used in multiple disciplines to make causal inference. In this dissertation, we aim to explore a doubly robust matching method which improves PSM in certain circumstances. Moreover, we extend matching techniques to the setting of complex surveys, and investigate how to estimate the variance of the matching estimator in this setting. We apply these methodological investigations to data from the Population Assessment of Tobacco and Health (PATH) survey, and assess their performance in a real study.
The dissertation comprises three studies. In the first study, the main objective is to investigate whether the use of electronic cigarettes (e-cigarettes) aids long-term cigarette/ nicotine cessation among adult U.S. smokers who want to quit cigarette smoking, using data from the PATH survey. Caliper nearest neighbor PSM is conducted to match each e-cigarette user to one or more e-cigarette non-users. The weighted difference of cessation rates for cigarettes / nicotine between the two groups is calculated among the matched pairs, and the bootstrap is used to assess statistical significance . We find that e-cigarettes may not be an effective cessation aid for adult smokers in the US and, instead, may contribute to continuing nicotine dependence.
PSM depends strongly on the correctness of the propensity score (PS). The first study motivates us to explore whether we can improve the existing PSM method, in the case when the PS is incorrect. Hence, in the second study, we propose and study a doubly-matched (DMT) estimator, which matches simultaneously on both the PS and another ‘balancing score’, the prognostic score (PGS), to make doubly robust (DR) causal inference. Our simulation study demonstrates that the estimator is doubly robust; that is, even if either the PS or the PGS is incorrect, the DMT estimator performs well as long as the other is correct. Furthermore, under the simple random sampling design, the full bootstrap method of variance estimation, which resamples individuals, re-estimates the PS, and re-conducts PSM, works well although it is sometimes conservative. Finally, we apply the DMT estimator to the question of interest in the first study and end up with a consistent conclusion that e-cigarettes may not be effective for assisting cigarette smoking cessation.
In the third study, taking the PSM estimator as an example we explore approaches to variance estimates of the matching estimator which are appropriate for complex surveys with survey weights. Such complex surveys typically have a hierarchical sampling design, in which subjects are sampled within clusters, which are sampled within strata. We prove the large sample consistency of the jackknife estimate of variance and the balanced repeated replicate (BRR) estimate of variance in the case when both the number of sampling strata and the number of subjects within sampled clusters go to infinity. Simulation studies demonstrate that BRR and Fay’s method, which is an adjusted BRR method commonly used in large population surveys, indeed outperform other commonly-used bootstrap methods. We also apply these variance estimates to PATH study data to investigate whether using counseling or self-help materials helps adult smokers reduce cigarette consumption in the long-term, and we end up with a negative conclusion. This study fills the gap in the variance estimation of the PSM/ general matching estimator in complex surveys.