Skip to main content
eScholarship
Open Access Publications from the University of California

Department of Biostatistics

Open Access Policy Deposits bannerUCLA

Open Access Policy Deposits

This series is automatically populated with publications deposited by UCLA Fielding School of Public Health Department of Biostatistics researchers in accordance with the University of California’s open access policies. For more information see Open Access Policy Deposits and the UC Publication Management System.

Cover page of Predictors of Short-Term Outcomes after Syncope: A Systematic Review and Meta-Analysis

Predictors of Short-Term Outcomes after Syncope: A Systematic Review and Meta-Analysis

(2018)

Introduction: We performed a systematic review and meta-analysis to identify predictors of serious clinical outcomes after an acute-care evaluation for syncope.

Methods: We identified studies that assessed for predictors of short-term (≤30 days) serious clinical events after an emergency department (ED) visit for syncope. We performed a MEDLINE search (January 1, 1990 - July 1, 2017) and reviewed reference lists of retrieved articles. The primary outcome was the occurrence of a serious clinical event (composite of mortality, arrhythmia, ischemic or structural heart disease, major bleed, or neurovascular event) within 30 days. We estimated the sensitivity, specificity, and likelihood ratio of findings for the primary outcome. We created summary estimates of association on a variable-by-variable basis using a Bayesian random-effects model.

Results: We reviewed 2,773 unique articles; 17 met inclusion criteria. The clinical findings most predictive of a short-term, serious event were the following: 1) An elevated blood urea nitrogen level (positive likelihood ratio [LR+]: 2.86, 95% confidence interval [CI] [1.15, 5.42]); 2); history of congestive heart failure (LR+: 2.65, 95%CI [1.69, 3.91]); 3) initial low blood pressure in the ED (LR+: 2.62, 95%CI [1.12, 4.9]); 4) history of arrhythmia (LR+: 2.32, 95%CI [1.31, 3.62]); and 5) an abnormal troponin value (LR+: 2.49, 95%CI [1.36, 4.1]). Younger age was associated with lower risk (LR-: 0.44, 95%CI [0.25, 0.68]). An abnormal electrocardiogram was mildly predictive of increased risk (LR+ 1.79, 95%CI [1.14, 2.63]).

Conclusion: We identified specific risk factors that may aid clinical judgment and that should be considered in the development of future risk-prediction tools for serious clinical events after an ED visit for syncope.

  • 3 supplemental files
Cover page of Estimating the Cost of Care for Emergency Department Syncope Patients: Comparison of Three Models

Estimating the Cost of Care for Emergency Department Syncope Patients: Comparison of Three Models

(2017)

Introduction: We sought to compare three hospital cost estimation models for patients undergoing evaluation for unexplained syncope with hospital cost data. Developing such a model would allow researchers to assess the value of novel clinical algorithms for syncope management.

Methods: Complete health services data, including disposition, testing, and length of stay (LOS), were collected on 67 adult patients (age 60 years and older) who presented to the Emergency Department (ED) with syncope at a single hospital. Patients were excluded if a serious medical condition was identified. Three hospital cost estimation models were created to estimate facility costs: V1, unadjusted Medicare payments for observation and/or hospital admission, V2: modified Medicare payment, prorated by LOS in calendar days, and, V3: modified Medicare payment, prorated by LOS in hours. Total hospital costs included unadjusted Medicare payments for diagnostic testing and estimated facility costs. These estimates were plotted against actual cost data from the hospital finance department. Correlation and regression analyses were performed.

Results: Of the three models, V3 consistently outperformed the others with regard to correlation and goodness of fit. The Pearson correlation coefficient for V3 was 0.88 (95% Confidence Interval 0.81, 0.92) with an R-square value of 0.77 and a linear regression coefficient of 0.87 (95% Confidence Interval 0.76, 0.99).

Conclusion: Using basic health services data, it is possible to accurately estimate hospital costs for older adults undergoing a hospital-based evaluation for unexplained syncope. This methodology could help assess the potential economic impact of implementing novel clinical algorithms for ED syncope. 

  • 2 supplemental files
Cover page of A Risk Score to Predict Short-Term Outcomes Following Emergency Department Discharge

A Risk Score to Predict Short-Term Outcomes Following Emergency Department Discharge

(2018)

Introduction: The emergency department (ED) is an inherently high-risk setting. Risk scores can help practitioners understand the risk of ED patients for developing poor outcomes after discharge. Our objective was to develop two risk scores that predict either general inpatient admission or death/intensive care unit (ICU) admission within seven days of ED discharge.

Methods: We conducted a retrospective cohort study of patients age > 65 years using clinical data from a regional, integrated health system for years 2009-2010 to create risk scores to predict two outcomes, a general inpatient admission or death/ICU admission. We used logistic regression to predict the two outcomes based on age, body mass index, vital signs, Charlson comorbidity index (CCI), ED length of stay (LOS), and prior inpatient admission.

Results: Of 104,025 ED visit discharges, 4,638 (4.5%) experienced a general inpatient admission and 531 (0.5%) death or ICU admission within seven days of discharge. Risk factors with the greatest point value for either outcome were high CCI score and a prolonged ED LOS. The C-statistic was 0.68 and 0.76 for the two models.

Conclusion: Risk scores were successfully created for both outcomes from an integrated health system, inpatient admission or death/ICU admission. Patients who accrued the highest number of points and greatest risk present to the ED with a high number of comorbidities and require prolonged ED evaluations.

  • 1 supplemental file
Cover page of Comparing penalization methods for linear models on large observational health data.

Comparing penalization methods for linear models on large observational health data.

(2024)

OBJECTIVE: This study evaluates regularization variants in logistic regression (L1, L2, ElasticNet, Adaptive L1, Adaptive ElasticNet, Broken adaptive ridge [BAR], and Iterative hard thresholding [IHT]) for discrimination and calibration performance, focusing on both internal and external validation. MATERIALS AND METHODS: We use data from 5 US claims and electronic health record databases and develop models for various outcomes in a major depressive disorder patient population. We externally validate all models in the other databases. We use a train-test split of 75%/25% and evaluate performance with discrimination and calibration. Statistical analysis for difference in performance uses Friedmans test and critical difference diagrams. RESULTS: Of the 840 models we develop, L1 and ElasticNet emerge as superior in both internal and external discrimination, with a notable AUC difference. BAR and IHT show the best internal calibration, without a clear external calibration leader. ElasticNet typically has larger model sizes than L1. Methods like IHT and BAR, while slightly less discriminative, significantly reduce model complexity. CONCLUSION: L1 and ElasticNet offer the best discriminative performance in logistic regression for healthcare predictions, maintaining robustness across validations. For simpler, more interpretable models, L0-based methods (IHT and BAR) are advantageous, providing greater parsimony and calibration with fewer features. This study aids in selecting suitable regularization techniques for healthcare prediction models, balancing performance, complexity, and interpretability.

Cover page of High Variability of Body Mass Index Is Independently Associated With Incident Heart Failure.

High Variability of Body Mass Index Is Independently Associated With Incident Heart Failure.

(2024)

BACKGROUND: Heart failure (HF) is a serious condition with increasing prevalence, high morbidity, and increased mortality. Obesity is an established risk factor for HF. Fluctuation in body mass index (BMI) has shown a higher risk of cardiovascular outcomes. We investigated the association between BMI variability and incident HF. METHODS AND RESULTS: In the UK Biobank, we established a prospective cohort after excluding participants with prevalent HF or cancer at enrollment. A total of 99 368 White participants with ≥3 BMI measures during >2 years preceding enrollment were included, with a median follow-up of 12.5 years. The within-participant variability of BMI was evaluated using standardized SD and coefficient of variation. The association of BMI variability with incident HF was assessed using Fine and Grays competing risk model, adjusting for confounding factors and participant-specific rate of BMI change. Higher BMI variability measured in both SD and coefficient of variation was significantly associated with higher risk in HF incidence (SD: hazard ratio [HR], 1.05 [95% CI, 1.03-1.08], P<0.0001; coefficient of variation: HR, 1.07 [95% CI, 1.04-1.10], P<0.0001). CONCLUSIONS: Longitudinal health records capture BMI fluctuation, which independently predicts HF incidence.

scCDC: a computational method for gene-specific contamination detection and correction in single-cell and single-nucleus RNA-seq data

(2024)

In droplet-based single-cell and single-nucleus RNA-seq assays, systematic contamination of ambient RNA molecules biases the quantification of gene expression levels. Existing methods correct the contamination for all genes globally. However, there lacks specific evaluation of correction efficacy for varying contamination levels. Here, we show that DecontX and CellBender under-correct highly contaminating genes, while SoupX and scAR over-correct lowly/non-contaminating genes. Here, we develop scCDC as the first method to detect the contamination-causing genes and only correct expression levels of these genes, some of which are cell-type markers. Compared with existing decontamination methods, scCDC excels in decontaminating highly contaminating genes while avoiding over-correction of other genes.

Cover page of Applications of nature-inspired metaheuristic algorithms for tackling optimization problems across disciplines.

Applications of nature-inspired metaheuristic algorithms for tackling optimization problems across disciplines.

(2024)

Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. This paper demonstrates the usefulness of such algorithms for solving a variety of challenging optimization problems in statistics using a nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA). This algorithm was proposed by one of the authors and its superior performance relative to many of its competitors had been demonstrated in earlier work and again in this paper. The main goal of this paper is to show a typical nature-inspired metaheuristic algorithmi, like CSO-MA, is efficient for tackling many different types of optimization problems in statistics. Our applications are new and include finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, estimating parameters in the commonly used Rasch model in education research, finding M-estimates for a Cox regression in a Markov renewal model, performing matrix completion tasks to impute missing data for a two compartment model, and selecting variables optimally in an ecology problem in China. To further demonstrate the flexibility of metaheuristics, we also find an optimal design for a car refueling experiment in the auto industry using a logistic model with multiple interacting factors. In addition, we show that metaheuristics can sometimes outperform optimization algorithms commonly used in statistics.

Cover page of The genomic evolutionary dynamics and global circulation patterns of respiratory syncytial virus.

The genomic evolutionary dynamics and global circulation patterns of respiratory syncytial virus.

(2024)

Respiratory syncytial virus (RSV) is a leading cause of acute lower respiratory tract infection in young children and the second leading cause of infant death worldwide. While global circulation has been extensively studied for respiratory viruses such as seasonal influenza, and more recently also in great detail for SARS-CoV-2, a lack of global multi-annual sampling of complete RSV genomes limits our understanding of RSV molecular epidemiology. Here, we capitalise on the genomic surveillance by the INFORM-RSV study and apply phylodynamic approaches to uncover how selection and neutral epidemiological processes shape RSV diversity. Using complete viral genome sequences, we show similar patterns of site-specific diversifying selection among RSVA and RSVB and recover the imprint of non-neutral epidemic processes on their genealogies. Using a phylogeographic approach, we provide evidence for air travel governing the global patterns of RSVA and RSVB spread, which results in a considerable degree of phylogenetic mixing across countries. Our findings highlight the potential of systematic global RSV genomic surveillance for transforming our understanding of global RSV spread.

A genome-wide spectrum of tandem repeat expansions in 338,963 humans

(2024)

The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute ∼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.

Cover page of Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models.

Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models.

(2024)

Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.