Skip to main content
eScholarship
Open Access Publications from the University of California

Department of Biostatistics

Open Access Policy Deposits bannerUCLA

Open Access Policy Deposits

This series is automatically populated with publications deposited by UCLA Fielding School of Public Health Department of Biostatistics researchers in accordance with the University of California’s open access policies. For more information see Open Access Policy Deposits and the UC Publication Management System.

Cover page of Predictors of Short-Term Outcomes after Syncope: A Systematic Review and Meta-Analysis

Predictors of Short-Term Outcomes after Syncope: A Systematic Review and Meta-Analysis

(2018)

Introduction: We performed a systematic review and meta-analysis to identify predictors of serious clinical outcomes after an acute-care evaluation for syncope.

Methods: We identified studies that assessed for predictors of short-term (≤30 days) serious clinical events after an emergency department (ED) visit for syncope. We performed a MEDLINE search (January 1, 1990 - July 1, 2017) and reviewed reference lists of retrieved articles. The primary outcome was the occurrence of a serious clinical event (composite of mortality, arrhythmia, ischemic or structural heart disease, major bleed, or neurovascular event) within 30 days. We estimated the sensitivity, specificity, and likelihood ratio of findings for the primary outcome. We created summary estimates of association on a variable-by-variable basis using a Bayesian random-effects model.

Results: We reviewed 2,773 unique articles; 17 met inclusion criteria. The clinical findings most predictive of a short-term, serious event were the following: 1) An elevated blood urea nitrogen level (positive likelihood ratio [LR+]: 2.86, 95% confidence interval [CI] [1.15, 5.42]); 2); history of congestive heart failure (LR+: 2.65, 95%CI [1.69, 3.91]); 3) initial low blood pressure in the ED (LR+: 2.62, 95%CI [1.12, 4.9]); 4) history of arrhythmia (LR+: 2.32, 95%CI [1.31, 3.62]); and 5) an abnormal troponin value (LR+: 2.49, 95%CI [1.36, 4.1]). Younger age was associated with lower risk (LR-: 0.44, 95%CI [0.25, 0.68]). An abnormal electrocardiogram was mildly predictive of increased risk (LR+ 1.79, 95%CI [1.14, 2.63]).

Conclusion: We identified specific risk factors that may aid clinical judgment and that should be considered in the development of future risk-prediction tools for serious clinical events after an ED visit for syncope.

  • 3 supplemental files
Cover page of Estimating the Cost of Care for Emergency Department Syncope Patients: Comparison of Three Models

Estimating the Cost of Care for Emergency Department Syncope Patients: Comparison of Three Models

(2017)

Introduction: We sought to compare three hospital cost estimation models for patients undergoing evaluation for unexplained syncope with hospital cost data. Developing such a model would allow researchers to assess the value of novel clinical algorithms for syncope management.

Methods: Complete health services data, including disposition, testing, and length of stay (LOS), were collected on 67 adult patients (age 60 years and older) who presented to the Emergency Department (ED) with syncope at a single hospital. Patients were excluded if a serious medical condition was identified. Three hospital cost estimation models were created to estimate facility costs: V1, unadjusted Medicare payments for observation and/or hospital admission, V2: modified Medicare payment, prorated by LOS in calendar days, and, V3: modified Medicare payment, prorated by LOS in hours. Total hospital costs included unadjusted Medicare payments for diagnostic testing and estimated facility costs. These estimates were plotted against actual cost data from the hospital finance department. Correlation and regression analyses were performed.

Results: Of the three models, V3 consistently outperformed the others with regard to correlation and goodness of fit. The Pearson correlation coefficient for V3 was 0.88 (95% Confidence Interval 0.81, 0.92) with an R-square value of 0.77 and a linear regression coefficient of 0.87 (95% Confidence Interval 0.76, 0.99).

Conclusion: Using basic health services data, it is possible to accurately estimate hospital costs for older adults undergoing a hospital-based evaluation for unexplained syncope. This methodology could help assess the potential economic impact of implementing novel clinical algorithms for ED syncope. 

  • 2 supplemental files
Cover page of A Risk Score to Predict Short-Term Outcomes Following Emergency Department Discharge

A Risk Score to Predict Short-Term Outcomes Following Emergency Department Discharge

(2018)

Introduction: The emergency department (ED) is an inherently high-risk setting. Risk scores can help practitioners understand the risk of ED patients for developing poor outcomes after discharge. Our objective was to develop two risk scores that predict either general inpatient admission or death/intensive care unit (ICU) admission within seven days of ED discharge.

Methods: We conducted a retrospective cohort study of patients age > 65 years using clinical data from a regional, integrated health system for years 2009-2010 to create risk scores to predict two outcomes, a general inpatient admission or death/ICU admission. We used logistic regression to predict the two outcomes based on age, body mass index, vital signs, Charlson comorbidity index (CCI), ED length of stay (LOS), and prior inpatient admission.

Results: Of 104,025 ED visit discharges, 4,638 (4.5%) experienced a general inpatient admission and 531 (0.5%) death or ICU admission within seven days of discharge. Risk factors with the greatest point value for either outcome were high CCI score and a prolonged ED LOS. The C-statistic was 0.68 and 0.76 for the two models.

Conclusion: Risk scores were successfully created for both outcomes from an integrated health system, inpatient admission or death/ICU admission. Patients who accrued the highest number of points and greatest risk present to the ED with a high number of comorbidities and require prolonged ED evaluations.

  • 1 supplemental file
Cover page of Routes of importation and spatial dynamics of SARS-CoV-2 variants during localized interventions in Chile.

Routes of importation and spatial dynamics of SARS-CoV-2 variants during localized interventions in Chile.

(2024)

Human mobility is strongly associated with the spread of SARS-CoV-2 via air travel on an international scale and with population mixing and the number of people moving between locations on a local scale. However, these conclusions are drawn mostly from observations in the context of the global north where international and domestic connectivity is heavily influenced by the air travel network; scenarios where land-based mobility can also dominate viral spread remain understudied. Furthermore, research on the effects of nonpharmaceutical interventions (NPIs) has mostly focused on national- or regional-scale implementations, leaving gaps in our understanding of the potential benefits of implementing NPIs at higher granularity. Here, we use Chile as a model to explore the role of human mobility on disease spread within the global south; the country implemented a systematic genomic surveillance program and NPIs at a very high spatial granularity. We combine viral genomic data, anonymized human mobility data from mobile phones and official records of international travelers entering the country to characterize the routes of importation of different variants, the relative contributions of airport and land border importations, and the real-time impact of the countrys mobility network on the diffusion of SARS-CoV-2. The introduction of variants which are dominant in neighboring countries (and not detected through airport genomic surveillance) is predicted by land border crossings and not by air travelers, and the strength of connectivity between comunas (Chiles lowest administrative divisions) predicts the time of arrival of imported lineages to new locations. A higher stringency of local NPIs was also associated with fewer domestic viral importations. Our analysis sheds light on the drivers of emerging respiratory infectious disease spread outside of air travel and on the consequences of disrupting regular movement patterns at lower spatial scales.

Cover page of Emergence of the B.1.214.2 SARS-CoV-2 lineage with an Omicron-like spike insertion and a unique upper airway immune signature.

Emergence of the B.1.214.2 SARS-CoV-2 lineage with an Omicron-like spike insertion and a unique upper airway immune signature.

(2024)

We investigate the emergence, mutation profile, and dissemination of SARS-CoV-2 lineage B.1.214.2, first identified in Belgium in January 2021. This variant, featuring a 3-amino acid insertion in the spike protein similar to the Omicron variant, was speculated to enhance transmissibility or immune evasion. Initially detected in international travelers, it substantially transmitted in Central Africa, Belgium, Switzerland, and France, peaking in April 2021. Our travel-aware phylogeographic analysis, incorporating travel history, estimated the origin to the Republic of the Congo, with primary European entry through France and Belgium, and multiple smaller introductions during the epidemic. We correlate its spread with human travel patterns and air passenger data. Further, upon reviewing national reports of SARS-CoV-2 outbreaks in Belgian nursing homes, we found this strain caused moderately severe outcomes (8.7% case fatality ratio). A distinct nasopharyngeal immune response was observed in elderly patients, characterized by 80% unique signatures, higher B- and T-cell activation, increased type I IFN signaling, and reduced NK, Th17, and complement system activation, compared to similar outbreaks. This unique immune response may explain the variants epidemiological behavior and underscores the need for nasal vaccine strategies against emerging variants.

Cover page of A review of feature selection strategies utilizing graph data structures and Knowledge Graphs.

A review of feature selection strategies utilizing graph data structures and Knowledge Graphs.

(2024)

Feature selection in Knowledge Graphs (KGs) is increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection (FS) within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through this comprehensive review, we aim to catalyze further innovation in FS for KGs, paving the way for more insightful, efficient, and interpretable analytical models across various domains. Our exploration reveals the critical importance of scalability, accuracy, and interpretability in FS techniques, advocating for the integration of domain knowledge to refine the selection process. We highlight the burgeoning potential of multi-objective optimization and interdisciplinary collaboration in advancing KG FS, underscoring the transformative impact of such methodologies on precision medicine, among other fields. The paper concludes by charting future directions, including the development of scalable, dynamic FS algorithms and the integration of explainable AI principles to foster transparency and trust in KG-driven models.

Cover page of Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations

Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations

(2024)

Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

Cover page of Comparing penalization methods for linear models on large observational health data.

Comparing penalization methods for linear models on large observational health data.

(2024)

OBJECTIVE: This study evaluates regularization variants in logistic regression (L1, L2, ElasticNet, Adaptive L1, Adaptive ElasticNet, Broken adaptive ridge [BAR], and Iterative hard thresholding [IHT]) for discrimination and calibration performance, focusing on both internal and external validation. MATERIALS AND METHODS: We use data from 5 US claims and electronic health record databases and develop models for various outcomes in a major depressive disorder patient population. We externally validate all models in the other databases. We use a train-test split of 75%/25% and evaluate performance with discrimination and calibration. Statistical analysis for difference in performance uses Friedmans test and critical difference diagrams. RESULTS: Of the 840 models we develop, L1 and ElasticNet emerge as superior in both internal and external discrimination, with a notable AUC difference. BAR and IHT show the best internal calibration, without a clear external calibration leader. ElasticNet typically has larger model sizes than L1. Methods like IHT and BAR, while slightly less discriminative, significantly reduce model complexity. CONCLUSION: L1 and ElasticNet offer the best discriminative performance in logistic regression for healthcare predictions, maintaining robustness across validations. For simpler, more interpretable models, L0-based methods (IHT and BAR) are advantageous, providing greater parsimony and calibration with fewer features. This study aids in selecting suitable regularization techniques for healthcare prediction models, balancing performance, complexity, and interpretability.

Cover page of High Variability of Body Mass Index Is Independently Associated With Incident Heart Failure.

High Variability of Body Mass Index Is Independently Associated With Incident Heart Failure.

(2024)

BACKGROUND: Heart failure (HF) is a serious condition with increasing prevalence, high morbidity, and increased mortality. Obesity is an established risk factor for HF. Fluctuation in body mass index (BMI) has shown a higher risk of cardiovascular outcomes. We investigated the association between BMI variability and incident HF. METHODS AND RESULTS: In the UK Biobank, we established a prospective cohort after excluding participants with prevalent HF or cancer at enrollment. A total of 99 368 White participants with ≥3 BMI measures during >2 years preceding enrollment were included, with a median follow-up of 12.5 years. The within-participant variability of BMI was evaluated using standardized SD and coefficient of variation. The association of BMI variability with incident HF was assessed using Fine and Grays competing risk model, adjusting for confounding factors and participant-specific rate of BMI change. Higher BMI variability measured in both SD and coefficient of variation was significantly associated with higher risk in HF incidence (SD: hazard ratio [HR], 1.05 [95% CI, 1.03-1.08], P<0.0001; coefficient of variation: HR, 1.07 [95% CI, 1.04-1.10], P<0.0001). CONCLUSIONS: Longitudinal health records capture BMI fluctuation, which independently predicts HF incidence.

scCDC: a computational method for gene-specific contamination detection and correction in single-cell and single-nucleus RNA-seq data

(2024)

In droplet-based single-cell and single-nucleus RNA-seq assays, systematic contamination of ambient RNA molecules biases the quantification of gene expression levels. Existing methods correct the contamination for all genes globally. However, there lacks specific evaluation of correction efficacy for varying contamination levels. Here, we show that DecontX and CellBender under-correct highly contaminating genes, while SoupX and scAR over-correct lowly/non-contaminating genes. Here, we develop scCDC as the first method to detect the contamination-causing genes and only correct expression levels of these genes, some of which are cell-type markers. Compared with existing decontamination methods, scCDC excels in decontaminating highly contaminating genes while avoiding over-correction of other genes.