- Main
Statistical Robustness - Distributed Linear Regression, Informative Censoring, Causal Inference, and Non-Proportional Hazards
- Luo, Jiyu
- Advisor(s): Xu, Ronghui
Abstract
Robustness broadly refers to the property of the statistical method being valid even when some of the model assumptions are violated. We investigate 4 types of statistical robustness under 4 different problem setups. Firstly, we consider linear regression under the distributed setting where data are stored in separate machines. When errors are subject to heavy-tailed and/or asymmetric errors, we develop a tail-robust distributed estimator that achieves a sub-Gaussian-type deviation bound without pooling all the data together and without assuming Gaussian errors. Moreover, the algorithm only transfers gradient in each step and is hence communication efficient. Secondly, we explore the two-group Cox proportional hazards (PH) model in a randomized study. When the non-informative censoring assumption no longer holds, the inverse probability of censoring weighting (IPCW) estimator helps correct the censoring bias by modeling the nuisance function for conditional censoring survival. To protect against the misspecification of the nuisance function, we propose an augmented IPCW (AIPCW) estimator which also models conditional failure survival. The AIPCW estimator is model double robust (DR) in that the estimator will be consistent and asymptotically normal (CAN) even when one of the root-n nuisance estimators is wrong. The estimator is also CAN if both nuisance functions are consistently estimated with their product error rate being faster than root-n. This so-called rate DR property allows us to make use of machine learning (ML) methods, which directly address the non-collapsibility of the Cox model. Thirdly, we extend the problem to observational data with the two-group survival following the marginal structural Cox model. In addition to the missingness due to censoring, we also need to deal with missingness coming from partial observations of the potential outcomes. By extending the AIPCW estimator to include the nuisance propensity score function, we develop an augmented IPW (AIPW) estimator that is again DR with respect to the models for failure time and for missing mechanisms. Lastly, we consider the scenario when the PH assumption fails and propose a causal estimand that is a weighted average of the time-varying log hazards ratio. We show that this estimand enjoys several desirable properties and can be estimated using the same AIPW estimator we proposed for the marginal structural Cox model. A method for plotting the time-varying log hazard ratio under observational data is also proposed.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-