Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Applications of Shared Random Effects Models to Electronic Health Records Data

Abstract

Electronic health records (EHR) give rise to complicated data structures, due in part to outcome-driven patient and physician decisions that impact the number and spacing of clinical observations, length of time under treatment, and reason for treatment termination. When dependencies exist between patient observations, outcomes, and treatment termination and are ignored in analyses, it can lead to biased parameter estimates and spurious conclusions. Additional complications exist in data obtained from EHR, where informative details, such as reasons for termination of care, frequently go unnoted or remain contained in nonsystematic forms. The objective of this dissertation is to discuss standard difficulties associated with longitudinal analyses using data arising from EHR and to present potential solutions to such challenges using methods appropriate for applied, clinical researchers.

Shared random effects models are a practical and effective method of modeling dependencies between observation times, outcomes, and terminal events. Three-part shared random effects models typically make use of a frailty model for the intensity of observation times of a medical-related event ("informative observation times"), a general mixed effects model for the longitudinal outcome ("repeated measures") that allows for flexibility in the temporal specification of the overall trajectory, and a Cox proportional hazards model for the timing of termination of care ("dependent terminal event"). This model formulation is typically applied when the dependent terminal event, for example death, is directly observed and distinguishable from independent censoring events. However, termination from care is a terminal event for which the exact date is often unobserved and may be indistinguishable from independent censoring based solely on EHR data, which makes direct application of shared random effects models infeasible.

I propose the use of an inverse cumulative hazard function to estimate individual-level survival times between patients’ last-recorded and their next-hypothetical observation, and use these estimates to help classify dependent and independent termination. In a simulation study I illustrate the effectiveness of this method in producing minimally-biased estimates based on the three-part shared random effects model. I apply the same method to model depression symptom trajectories over time using EHR data from Behavioral Health Associates (BHA), a UCLA Health primary and behavioral health collaborative care system.

Further, I examine the utility of a cure model in handling zero-inflated recurrent events data and in an alternate, probabilistic approach to unobserved terminal events we propose an extension of an adaptable cure frailty model that represents the probability that a subject will become unsusceptible to future recurrent events after any given event. I change the terminology from "cure'' to "treatment termination'' such that I model the probability a patient will terminate treatment after each clinical observation. An added benefit of this approach is the cure model’s ability to simultaneously account for the zero-inflatedness common in EHR data (e.g. the overrepresentation of subjects with zero recurrent events).

I describe common issues inherent in EHR data and demonstrate a series of statistical methods that offer practical solutions to these challenges. I provide analytical tools for applied researchers to easily implement such methods in existing statistical software.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View