Skip to main content
eScholarship
Open Access Publications from the University of California

Department of Statistics

Open Access Policy Deposits bannerUC Irvine

Open Access Policy Deposits

This series is automatically populated with publications deposited by UC Irvine Donald Bren School of Information and Computer Sciences Department of Statistics researchers in accordance with the University of California’s open access policies. For more information see Open Access Policy Deposits and the UC Publication Management System.

Cover page of Comment: The Future of the Textbook

Comment: The Future of the Textbook

(2013)

Commentary on The Future of the Textbook.

Cover page of Reproducibility in the Classroom

Reproducibility in the Classroom

(2024)

Difficulties in reproducing results from scientific studies have lately been referred to as ``reproducibility crisis". Scientific practice depends heavily on scientific training. What gets taught in the classroom is often practiced in labs, fields, and data analysis. The importance of reproducibility in the classroom has gained momentum in statistics education in recent years. In this manuscript, we review the existing literature on reproducibility education. We delve into the relationship between computing tools and reproducibility through visiting historical developments in this area. We share examples for teaching reproducibility and reproducible teaching while discussing the pedagogical opportunities created by these examples as well as challenges that the instructors should be aware of. We detail the use of teaching reproducibility and reproducible teaching practices in an introductory data science course. Lastly, we provide recommendations on reproducibility education for instructors, administrators, and other members of the scientific community.

Cover page of Individual longitudinal changes in DNA-methylome identify signatures of early-life adversity and correlate with later outcome.

Individual longitudinal changes in DNA-methylome identify signatures of early-life adversity and correlate with later outcome.

(2024)

Adverse early-life experiences (ELA) affect a majority of the worlds children. Whereas the enduring impact of ELA on cognitive and emotional health is established, there are no tools to predict vulnerability to ELA consequences in an individual child. Epigenetic markers including peripheral-cell DNA-methylation profiles may encode ELA and provide predictive outcome markers, yet the interindividual variance of the human genome and rapid changes in DNA methylation in childhood pose significant challenges. Hoping to mitigate these challenges we examined the relation of several ELA dimensions to DNA methylation changes and outcome using a within-subject longitudinal design and a high methylation-change threshold. DNA methylation was analyzed in buccal swab/saliva samples collected twice (neonatally and at 12 months) in 110 infants. We identified CpGs differentially methylated across time for each child and determined whether they associated with ELA indicators and executive function at age 5. We assessed sex differences and derived a sex-dependent impact score based on sites that most contributed to methylation changes. Changes in methylation between two samples of an individual child reflected age-related trends and correlated with executive function years later. Among tested ELA dimensions and life factors including income to needs ratios, maternal sensitivity, body mass index and infant sex, unpredictability of parental and household signals was the strongest predictor of executive function. In girls, high early-life unpredictability interacted with methylation changes to presage executive function. Thus, longitudinal, within-subject changes in methylation profiles may provide a signature of ELA and a potential predictive marker of individual outcome.

Cover page of Childhood unpredictability is associated with increased risk for long- and short-term depression and anhedonia symptoms following combat deployment.

Childhood unpredictability is associated with increased risk for long- and short-term depression and anhedonia symptoms following combat deployment.

(2024)

High unpredictability has emerged as a dimension of early-life adversity that may contribute to a host of deleterious consequences later in life. Early-life unpredictability affects development of limbic and reward circuits in both rodents and humans, with a potential to increase sensitivity to stressors and mood symptoms later in life. Here, we examined the extent to which unpredictability during childhood was associated with changes in mood symptoms (anhedonia and general depression) after two adult life stressors, combat deployment and civilian reintegration, which were assessed ten years apart. We also examined how perceived stress and social support mediated and /or moderated links between childhood unpredictability and mood symptoms. To test these hypotheses, we leveraged the Marine Resiliency Study, a prospective longitudinal study of the effects of combat deployment on mental health in Active-Duty Marines and Navy Corpsman. Participants (N = 273) were assessed for depression and anhedonia before (pre-deployment) and 3-6 months after (acute post-deployment) a combat deployment. Additional assessment of depression and childhood unpredictability were collected 10 years post-deployment (chronic post-deployment). Higher childhood unpredictability was associated with higher anhedonia and general depression at both acute and chronic post-deployment timepoints (βs ≥ 0.16, ps ≤.007). The relationship between childhood unpredictability and subsequent depression at acute post-deployment was partially mediated by lower social support (b = 0.07, 95% CI [0.03, 0.15]) while depression at chronic post-deployment was fully mediated by a combination of lower social support (b = 0.14, 95% CI [0.07, 0.23]) and higher perceived stress (b = 0.09, 95% CI [0.05, 0.15]). These findings implicate childhood unpredictability as a potential risk factor for depression in adulthood and suggest that increasing the structure and predictability of childhood routines and developing social support interventions after life stressors could be helpful for preventing adult depression.

Cover page of How Trustworthy Is Your Tree? Bayesian Phylogenetic Effective Sample Size Through the Lens of Monte Carlo Error.

How Trustworthy Is Your Tree? Bayesian Phylogenetic Effective Sample Size Through the Lens of Monte Carlo Error.

(2024)

Bayesian inference is a popular and widely-used approach to infer phylogenies (evolutionary trees). However, despite decades of widespread application, it remains difficult to judge how well a given Bayesian Markov chain Monte Carlo (MCMC) run explores the space of phylogenetic trees. In this paper, we investigate the Monte Carlo error of phylogenies, focusing on high-dimensional summaries of the posterior distribution, including variability in estimated edge/branch (known in phylogenetics as split) probabilities and tree probabilities, and variability in the estimated summary tree. Specifically, we ask if there is any measure of effective sample size (ESS) applicable to phylogenetic trees which is capable of capturing the Monte Carlo error of these three summary measures. We find that there are some ESS measures capable of capturing the error inherent in using MCMC samples to approximate the posterior distributions on phylogenies. We term these tree ESS measures, and identify a set of three which are useful in practice for assessing the Monte Carlo error. Lastly, we present visualization tools that can improve comparisons between multiple independent MCMC runs by accounting for the Monte Carlo error present in each chain. Our results indicate that common post-MCMC workflows are insufficient to capture the inherent Monte Carlo error of the tree, and highlight the need for both within-chain mixing and between-chain convergence assessments.

Where’s Waldo, Ohio? Using Cognitive Models to Improve the Aggregation of Spatial Knowledge

(2024)

We apply cognitive modeling to improve the wisdom of the crowd in a spatial knowledge task. Participants provided point estimates for where 48 US cities are located and then, using the point estimate as a center point, chose a radius large enough that they believed the resulting circle was certain to contain the city’s location. Simple and radius-weighted arithmetic averages of the individuals’ point estimates produced more accurate group answers than the majority of individuals. These statistical aggregates, however, assume there are no differences in individual expertise nor in the difficulty of locating different cities. Accordingly, we develop a set of cognitive models to infer group estimates that make various assumptions about individual expertise and differences in city difficulty. The model-based estimates generally outperform the statistical averages. The models are especially accurate if they allow for individual differences in expertise that can vary city by city. We replicate this finding by applying the same cognitive models to data reported by Mayer and Heck (2023) in which participants provided point estimates for the locations of European cities.

The HDI + ROPE Decision Rule Is Logically Incoherent But We Can Fix It

(2024)

The Bayesian highest-density interval plus region of practical equivalence (HDI + ROPE) decision rule is an increasingly common approach to testing null parameter values. The decision procedure involves a comparison between a posterior highest-density interval (HDI) and a prespecified region of practical equivalence. One then accepts or rejects the null parameter value depending on the overlap (or lack thereof) between these intervals. Here, we demonstrate, both theoretically and through examples, that this procedure is logically incoherent. Because the HDI is not transformation invariant, the ultimate inferential decision depends on statistically arbitrary and scientifically irrelevant properties of the statistical model. The incoherence arises from a common confusion between probability density and probability proper. The HDI + ROPE procedure relies on characterizing posterior densities as opposed to being based directly on probability. We conclude with recommendations for alternative Bayesian testing procedures that do not exhibit this pathology and provide a "quick fix" in the form of quantile intervals. This article is the work of the authors and is reformatted from the original, which was published under a CC-BY Attribution 4.0 International license and is available at https://psyarxiv.com/5p2qt/. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

Cover page of A Novel Approach to Integrate Human Biomonitoring Data with Model Predicted Dietary Exposures: A Crop Protection Chemical Case Study Using Lambda-Cyhalothrin.

A Novel Approach to Integrate Human Biomonitoring Data with Model Predicted Dietary Exposures: A Crop Protection Chemical Case Study Using Lambda-Cyhalothrin.

(2024)

The appropriate use of human biomonitoring data to model population chemical exposures is challenging, especially for rapidly metabolized chemicals, such as agricultural chemicals. The objective of this study is to demonstrate a novel approach integrating model predicted dietary exposures and biomonitoring data to potentially inform regulatory risk assessments. We use lambda-cyhalothrin as a case study, and for the same representative U.S. population in the National Health and Nutrition Examination Survey (NHANES), an integrated exposure and pharmacokinetic model predicted exposures are calibrated to measurements of the urinary metabolite 3-phenoxybenzoic acid (3PBA), using an approximate Bayesian computing (ABC) methodology. We demonstrate that the correlation between modeled urinary 3PBA and the NHANES 3PBA measurements more than doubled as ABC thresholding narrowed the acceptable tolerance range for predicted versus observed urinary measurements. The median predicted urinary concentrations were closer to the median measured value using ABC than using current regulatory Monte Carlo methods.

Cover page of Deep latent variable joint cognitive modeling of neural signals and human behavior

Deep latent variable joint cognitive modeling of neural signals and human behavior

(2024)

As the field of computational cognitive neuroscience continues to expand and generate new theories, there is a growing need for more advanced methods to test the hypothesis of brain-behavior relationships. Recent progress in Bayesian cognitive modeling has enabled the combination of neural and behavioral models into a single unifying framework. However, these approaches require manual feature extraction, and lack the capability to discover previously unknown neural features in more complex data. Consequently, this would hinder the expressiveness of the models. To address these challenges, we propose a Neurocognitive Variational Autoencoder (NCVA) to conjoin high-dimensional EEG with a cognitive model in both generative and predictive modeling analyses. Importantly, our NCVA enables both the prediction of EEG signals given behavioral data and the estimation of cognitive model parameters from EEG signals. This novel approach can allow for a more comprehensive understanding of the triplet relationship between behavior, brain activity, and cognitive processes.