Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Scalable Methods for Survival Analysis using Massive Observational Data

Abstract

The emerging observational health data, such as electronic health records and administrative claims, provide a rich resource for learning about treatment effects and risks. However, computational challenges arise when fitting statistical models to such large-scale and high-dimensional data. In this dissertation, I employ parallel computing techniques to address the computational bottlenecks associated widely used statistical models in observational studies. First, I present a novel parallel scan algorithm to scale up the Cox proportional hazards model and the Fine-Gray model. This advancement significantly accelerates the execution of large-scale comparative effectiveness and safety studies involving millions of patients and thousands of patient characteristics by an order of magnitude. Second, I apply an efficient parallel segmented-scan algorithm to accelerate the computational intensive parts shared by the stratified Cox model, the Cox model with time-varying covariates, and the Cox model with time-varying coefficients. This innovation enables efficient large-scale and high-dimensional Cox modeling with stratification or time-varying effect, delivering an order of magnitude speedup over traditional central processing unit-based methods. Third, I introduce a memory-efficient approach for fitting pooled logistic regression models with massive sample-size data. This approach offers a valuable tool, allowing for pooled logistic regression analysis on massive sample sizes, even when computational resources are limited. I have implemented all of the above work in the open-source R package Cyclops.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View