Search

Scholarly Works (29 results)

Sort By:

Show:

Article
Peer Reviewed

Natural history of diseases: Statistical designs and issues

Jewell, Nicholas P

UC Berkeley Previously Published Works (2016)

Understanding the natural history of a disease is an important prerequisite for designing studies that assess the impact of interventions, both chemotherapeutic and environmental, on the initiation and expression of the condition. Identification of biomarkers that mark disease progression may provide important indicators for drug targets and surrogate outcomes for clinical trials. However, collecting and visualizing data on natural history is challenging, in part, because disease processes are complex and evolve in different chronological periods for different subjects. Various epidemiological designs are used to elucidate components of the natural history process. We briefly discuss statistical issues, limitations, and challenges associated with various epidemiological designs.

Cover page: Natural history of diseases: Statistical designs and issues

Article
Peer Reviewed

Misclassification of current status data

UC Berkeley Previously Published Works (2010)

Thesis
Peer Reviewed

Applications of Semi-parametric Estimation Methods in Causal Inference and Prediction

Jamshidian, Farid
Advisor(s): Jewell, Nicholas P

UC Berkeley Electronic Theses and Dissertations (2011)

In this thesis, we argue for the use of loss-based semi-parametric estimation methods as an alternative to traditional parametric models in causal inference and prediction. We present a brief discussion on "black box" epidemiology in the first chapter and argue that risk factor epidemiology can be improved by using semi-parametric estimation methods. We demonstrate the use of semi-parametric methods by applying them to two different problems: one in causal inference and another in prediction. In each case, we demonstrate the process one would follow to define the question of interest, parameterize this question, and estimate it using semi-parametric methods. In the second chapter we introduce a formal concept of a perception effect, and define unmasking and placebo effects in the context of randomized trials. We employ modern tools from causal inference to derive semi-parametric estimators of such effects. The methods are illustrated on a motivating example from a recent pain trial where the occurrence of treatment-related side effects acts as a proxy for unmasking. In the third chapter, we redefine perception and unmasking effects for a longitudinal setting, and explore various causal graphs for the gabapentin trial. We demonstrate application of the semi-parametric methods in this more general setting by assuming a more complicated causal graph. To estimate the parameters, we use Maximum Likelihood Estimation and two different versions of Targeted Maximum Likelihood Estimation. Finally, in chapter four, we approach coronary heart disease risk prediction modeling from a semi-parametric perspective using data from the Framingham study. The "super learner" is used with a library of machine learning algorithms to create an ensemble risk prediction model for coronary heart disease. We define relative risk importance parameters for various risk factors and estimate them with semi-parametric methods used in earlier chapters. The results are compared to the Framingham study and those obtained by fitting a parametric model to the Framingham dataset.

Cover page: Applications of Semi-parametric Estimation Methods in Causal Inference and Prediction

Article
Peer Reviewed

On the use of the reproduction number for SARS‐CoV‐2: Estimation, misinterpretations and relationships with other ecological measures

UC Berkeley Previously Published Works (2022)

The basic reproduction number, R ₀, and its real-time analogue, R_t , are summary measures that reflect the ability of an infectious disease to spread through a population. Estimation methods for R_t have a long history, have been widely developed and are now enhanced by application to the COVID-19 pandemic. While retrospective analyses of R_t have provided insight into epidemic dynamics and the effects of control strategies in prior outbreaks, misconceptions around the interpretation of R_t have arisen with broader recognition and near real-time monitoring of this parameter alongside reported case data during the COVID-19 pandemic. Here, we discuss some widespread misunderstandings regarding the use of R_t as a barometer for population risk and its related use as an 'on/off' switch for policy decisions regarding relaxation of non-pharmaceutical interventions. Computation of R_t from downstream data (e.g. hospitalizations) when infection counts are unreliable exacerbates lags between when transmission happens and when events are recorded. We also discuss analyses that have shown various relationships between R_t and measures of mobility, vaccination coverage and a test-trace-isolation intervention in different settings.

Cover page: On the use of the reproduction number for SARS‐CoV‐2: Estimation, misinterpretations and relationships with other ecological measures

Thesis
Peer Reviewed

The Analysis of Cluster-Randomized Test-Negative Designs: Eliminating Dengue

Dufault, Suzanne M
Advisor(s): Jewell, Nicholas P

UC Berkeley Electronic Theses and Dissertations (2020)

According to the World Health Organization, dengue is the most critical and most rapidly spreading mosquito-borne viral disease in the world and is responsible for the infection of an estimated 380 million people across the globe annually. There is no cure for dengue, making

prevention key to disrupting the rapid progression of this disease into the world's population.

Recent scientific advances target the mosquito's ability to carry and transmit viral diseases. The method motivating this research injects a safe, naturally occurring bacterium called Wolbachia into the mosquito population responsible for the spread of dengue and other arboviruses including Zika, chikungunya, and yellow fever. When successfully introduced into the mosquito population, Wolbachia prevents these viruses from replicating, which reduces the potential of transmission to humans.

This dissertation addresses the statistical evaluation of the impact of studies of such mosquito-based interventions. Collecting reliable evidence for mosquito-borne interventions is often expensive and logistically prohibitive. The Cluster Randomized Test-Negative Design

discussed in this thesis addresses many of the barriers to such vital research. In this trial setting and several variations, I propose and evaluate estimators of intervention impact. These results can be used to better inform policies and protect vulnerable populations.

Cover page: The Analysis of Cluster-Randomized Test-Negative Designs: Eliminating Dengue

Thesis
Peer Reviewed

Topics in Survival Analysis

Petito, Lucia Catherine
Advisor(s): Jewell, Nicholas P.

UC Berkeley Electronic Theses and Dissertations (2017)

This dissertation covers three distinct topics in survival analysis: 1) current status data in the context of group testing subject to misclassification; 2) marginal structural modeling of a safety outcome from clinical trial data; and 3) the relationship between preterm birth and weight gain in pregnancy. Abstracts for each chapter separately are presented below.

Chapter 2. Group testing, introduced by Dorfman (1943), has been used to reduce costs when estimating the prevalence of a binary characteristic based on a screening test of k groups that include n independent individuals in total. If the unknown prevalence is low, and the screening test suffers from misclassification, it is also possible to obtain more precise prevalence estimates than those obtained from testing all n samples separately (Tu et al., 1994). In some applications, the individual binary response corresponds to whether an underlying time-to-event variable T is less than an observed screening time C, a data structure known as current status data. Given sufficient variation in the observed Cs, it is possible to estimate the distribution function, F, of T nonparametrically, at least at some points in its support, using the pool-adjacent-violators algorithm (Ayer et al., 1955). Here, we consider nonparametric estimation of F based on group tested current status data for groups of size k where the group tests positive if and only if any individual's unobserved T is less than its corresponding observed C. We investigate the performance of the group-based estimator as compared to the individual test nonparametric maximum likelihood estimator, and show that the former can be more precise in the presence of misclassification for low values of F(t). Potential applications include testing for the presence of various diseases from pooled samples where interest focuses on the age at incidence distribution rather than overall prevalence. We apply this estimator to the age-at-incidence curve for hepatitis C infection in a sample of U.S. women who gave birth to a child in 2014, where group assignment is done at random and based on maternal age. We discuss the relationship to other work in the literature, and potential extensions.

Chapter 3. Marginal structural modeling was first developed to address time-dependent confounding in studies where the effect of a time-varying exposure on an outcome is of interest. This chapter begins by introducing the reader to the concept of time-dependent confounding, and describes inverse probability weighting estimators for parameters of marginal structural models. The second part of chapter 3 contains an application of marginal structural modeling in a drug safety study. Studies in pharmacoepidemiology are often conducted in rich data sources, such as clinical trials or administrative databases, where large quantities of information are collected repeatedly over time. These data sources can and should be exploited, but traditional methods often cannot incorporate all available data, and fail to take time-dependent confounding into account. Marginal structural modeling and weighted estimators, tools often used in observational studies, can help to alleviate these challenges.

Our objective in this study was to estimate the relation between rheumatoid arthritis (RA) disease activity, cholesterol levels, and major adverse cardiovascular events (MACE) in patients with moderate to severe rheumatoid arthritis who are currently prescribed tocilizumab, accounting for the presence of time-dependent confounding, such as other inflammatory markers, lipid levels, and rheumatoid arthritis disease measures. We studied 3,986 patients enrolled in one of five clinical trials used to study tocilizumab, who then joined one of three long-term extension studies. We used a weighted logistic regression model to explore associations between pre-treatment levels of RA disease activity and cholesterol on the 5-year risk of MACE. We then used a logistic marginal structural model to explore causal relations between pre- and post-treatment RA disease activity and cholesterol levels, and 5-year risk of MACE, adjusting for time-dependent confounders. We did not find evidence that pre- or post-treatment levels of RA disease activity, HDL cholesterol, and LDL cholesterol were associated with increased risk of MACE in patients with moderate to severe rheumatoid arthritis taking tocilizumab, once time-dependent confounding from inflammatory markers and other lipid levels was taken into account. After adjustment for time dependent confounding, traditional markers of disease activity and cholesterol were not associated with an increased risk of cardiac events among RA patients treated with tocilizumab.

Chapter 4. The relationship between weight gain in pregnancy and preterm birth is still contested due to their inherent dependence. In the first part of Chapter 4, we wanted to quantify the relationship between pregnancy weight gain with early and late preterm birth and evaluate whether associations differed between non-Hispanic (NH) black and NH white women. We analyzed a retrospective cohort of all live births to NH black and NH white women in the U.S. 2011-2015 (n = 10,714,983). We used weight gain z-scores in multiple logistic regression models, stratified by prepregnancy body mass index (BMI) and race, to calculate population attributable risks (PAR) and PAR percentages for early and late preterm birth. We found that both low and high pregnancy weight gain were related to preterm birth, but these associations varied by BMI and race, and differed from associations with late preterm birth. For high weight gain and early preterm birth, the PAR percentage ranged from 8-10% in NH black women and from 6-8% in NH white women. Racial differences were small or nonexistent for late preterm birth, with PAR percentages ranging from 2-7% in NH black women and from 3-7% in NH white women. We conclude that these findings add to evidence that moderate gestational weight gain could help prevent preterm birth, and suggest that the impact may be greatest for early preterm birth in NH black women.

The second part of Chapter 4 is a preliminary analysis assessing the variety of measures of weight gain in pregnancy and their relationship with preterm birth. Serial GWG measurements provide ideal data, but are rarely available in population health datasets. The electronic medical records from 160,635 women in Sweden have been compiled to be the largest dataset in the world that contains repeated weight gain measures through pregnancy. Here, we describe the pattern of weight gain in pregnancy in 103,661 Swedish pregnancies, and assess whether the observed pattern before 37 weeks' gestation differs between preterm and term pregnancies.

Thesis
Peer Reviewed

Estimating the size of unobserved populations in human rights: Problems in Syria and El Salvador

Mejia, Robin Krieger
Advisor(s): Jewell, Nicholas P

UC Berkeley Electronic Theses and Dissertations (2016)

In this dissertation, I examine two human right estimation problems.

First, I assess data on child abductions from El Salvador's civil war. Between 1979 and 1992, El Salvador was wracked by conflict between leftist guerrilla groups and right-wing nationalist governments. One feature of the conflict was the abduction of children by government military forces, or the forced surrender of children to those same forces. Since 1994, La Asociación Pro-Búsqueda de Niñas y Niños Desaparecidos has investigated cases of these child abductions. To date, they have opened more than 950 cases and located nearly 400 abducted children (now, young adults). The organization remains active, and new cases come to light each year. In Chapter 2, I examine Pro Busqueda's data, assessing what can be said to date about the total as yet unknown number of abductions that occurred. I demonstrate that more abductions occurred than the number of currently known cases discuss capture-recapture estimates under a range of assumptions about the data available today. I then lay out a plan for updating estimates as new data becomes available.

Then, I examine current data on deaths from the ongoing conflict in Syria. Early in the conflict, the United Nations Office of the High Commissioner for Refugees (UNOHCR) contracted with statisticians at the Human Rights Data Analysis Group (HRDAG) to analyze data from multiple human rights groups that were documenting deaths from the conflict there. HRDAG produced three reports from the United Nations and has maintained ongoing relationships with the local human rights groups that are collecting the raw data. HRDAG is now in the unusual position of possessing a series of multiple ``snapshots'' of each group's data, collected at a number of points between 2012 and 2016. Using those snapshots, I examine how each group's data is changing over time, and discuss how those changes can impact resulting estimates of unreported deaths, showing that the changes can result in estimates for a single governorate that vary by nearly 100,000. In addition, I take advantage of the large number of processed cases to assess the performance of a variety of classification algorithms in determining whether two records refer to the same individual.

Cover page: Estimating the size of unobserved populations in human rights: Problems in Syria and El Salvador

Thesis
Peer Reviewed

Topics in Current Status Data

McKeown, Karen Michelle
Advisor(s): Jewell, Nicholas P

UC Berkeley Electronic Theses and Dissertations (2011)

This dissertation considers topics in current status data, a type of survival data where the only available information on the survival time is whether or not the event time has occurred before the examination time. We introduce the concept of current status data and give some motivating examples to highlight some of the many areas in which this type of data naturally occur in practice. We discuss some of the well known and widely used methods for analyzing current status data, along with some of the more recent developments in the area, and provide appropriate references to these previously examined methods. Within this dissertation, we add to the existing literature in the area by developing ideas not previously addressed from a current status data perspective.

We describe a simple method for nonparametric estimation of a distribution function based on current status data where observations of current status information are subject to (known) misclassification. Nonparametric maximum likelihood techniques are obtained through the use of a straightforward set of adjustments to the familiar pool-adjacent violators algorithm, which is generally used when misclassification is assumed absent. The methods are extended to allow for misclassification rates that vary over time, particularly when misclassification is most likely to occur close to the time of the true failure event. Using the ideas of binary generalized linear models with outcomes subject to misclassification we consider regression models for the underlying survival time. The ideas are motivated by and applied to an example on human papillomavirus (HPV) infection status amongst women examined in San Francisco. Additional applications on breastfeeding behaviors and menopausal status are also presented. As an extension we consider group testing with current status data in the presence of misclassification. Group testing combines samples, such as blood or urine, from a number of individuals and tests the group sample for the presence of the disease of interest instead of testing each individual sample. We examine whether group testing can be used to not only reduce the costs incurred with testing a large number of individuals but also improve the efficiency in estimating the underlying distribution function. We also seek to determine the optimal group size for nonparametric estimation of a distribution function, under various group testing scenarios. Regression models for the group testing approach are briefly considered.

We also describe current status data from the perspective of counting processes. We examine the relationship between current status data and simple counting processes. Specifically we consider the multistate model defined by two survival times of interest where one only observes whether or not each of the individual survival times exceed a common observed monitoring time. We are interested in estimation of the distribution function of time to the first event and whether current status information on the subsequent event can be used to improve this estimate. For both single and multiple monitoring time scenarios, in the fully nonparametric setting, one cannot improve the naive estimator, using information on the first event only, when estimating smooth functionals of the distribution of time to the first

event (van der Laan and Jewell (2003)). We therefore examine improving this naive estimator when parametric assumptions about the waiting time between the two events are made. For situations where this waiting time is modifiable by design, we also determine the optimal length of the waiting time for estimation of the cumulative hazard of the distribution of time to the first event in the recent past. The ideas are motivated by and applied to an example on simultaneous accurate and diluted HIV test data.

Cover page: Topics in Current Status Data

Article
Peer Reviewed

The role of firearm and alcohol availability in firearm suicide: A population-based weighted case-control study

UC Davis Previously Published Works (2023)

Firearm availability has been linked to firearm self-harm, but the joint relationship with alcohol availability, while supported by theory, has not been examined. This study sought to quantify the separate and joint relations of community firearm and alcohol availability with individual-level risk of (fatal and nonfatal) firearm self-harm. We conducted a case-control study of California residents, 2005-2015, using statewide mortality, hospital, firearm transfer, and alcohol license data. We estimated monthly marginal risk differences per 100,000 in the overall population and in white men aged 50+ under various hypothetical changes to firearm and alcohol availability and assessed additive interactions using case-control-weighted g-computation. In the overall population, non-pawn shop firearm dealer density was associated with firearm self-harm (RD: 0.02, 95% CI: 0.003, 0.04) but pawn shop firearm dealer and alcohol outlet densities were not. Secondary analyses revealed a relationship between firearm sales density and firearm self-harm (RD: 0.07, 95% CI: 0.04, 0.10). There were no additive interactions between measures of firearm and alcohol availability. Among older white men, generally the same exposures were related to self-harm as in the overall population, but point estimates were substantially larger. Findings suggest community-level approaches to reducing firearm sales may help mitigate suicide risk.

Cover page: The role of firearm and alcohol availability in firearm suicide: A population-based weighted case-control study

Thesis
Peer Reviewed

Topics in Evidence Synthesis

UC Berkeley Electronic Theses and Dissertations (2014)

This dissertation considers three different topics related to extracting and merging evidence from heterogeneous sources. This problem is addressed from different angles, from the field of design of experiment to machine learning.

Within this dissertation, we add to the existing literature in each area by developing novel methodology and software.

Adaptive trial designs can considerably improve upon traditional designs,

by modifying design aspects of the ongoing trial, like early stopping,

adding or dropping doses, or changing the sample size.

We propose a two-stage Bayesian adaptive design for a Phase IIb study aimed at selecting the lowest effective dose for Phase III. In this setting, efficacy has been proved for a high dose in a Phase IIa proof-of-concept study, but the existence of a

lower but still effective dose is investigated before the scheduled Phase III starts.

In the first stage patients are randomized to placebo, maximal

tolerated dose, and one or more additional doses within the dose

range. Based on an interim analysis, the study is either stopped for

futility or success, or enters the second stage, where newly recruited

patients are allocated to placebo, some fairly high dose, and one

additional dose chosen based on interim data. At the interim analysis

criteria based on the predictive probability of success are used to

decide on whether to stop or to continue the trial, and, in the latter

case, which dose to select for the second stage.

Finally, a dose will be selected as lowest effective dose for Phase III

either at the end of the first or at the end of the second stage.

The operating characteristics of the procedure are evaluated via

simulations and results are presented for several scenarios comparing

the performance of the proposed procedure to those of the non adaptive

design.

The development of novel therapies in multiple sclerosis (MS) is one area where a range of surrogate

outcomes are used in various stages of clinical research. While the aim of treatments in MS is to prevent

disability, a clinical trial for evaluating a drugs effect on disability progression would require a large

sample of patients with many years of follow-up. The early stage of MS is characterized by relapses. To

reduce study size and duration, clinical relapses are accepted as primary endpoints in phase III trials. For

phase II studies, the primary outcomes are typically lesion counts based on Magnetic Resonance Imaging

(MRI), as these are considerably more sensitive than clinical measures for detecting MS activity.

Recently, Sormani and colleagues \cite{sormani2010surrogate} provided a systematic review, and

used weighted regression analyses to examine the role of either MRI lesions or relapses as trial level

surrogate outcomes for disability. We build on this work by developing a Bayesian three-level model,

accommodating the two surrogates and the disability endpoint, and properly taking into account that

treatment effects are estimated with errors. Specifically, a combination of treatment effects based on

MRI lesion count outcomes and clinical relapse, both expressed on the log risk ratio scale, were used to

develop a study level surrogate outcome model for the corresponding treatment effects based on

disability progression. While the primary aim for developing this model was to support decision making

in drug development, the proposed model may also be considered for future validation.

In Genomics and Epidemiology we deal with a high number of features for each observation. Many well known approaches to drawing inferences in this kind of settings use the topology of the feature space, induced by an appropriate metric, to group observations and summarize their main characteristics to get rid of the noise and to predict an outcome of interest. In the present work we generalize this approach in the context of Loss-Based Estimation. We propose an alternative method for constructing a nonparametric multidimensional regression function. This approach is based on the simple idea of clustering data points in the feature space and then fitting a constant to the outcome. HOPACH-PAM is used for partition. This approach results in the choice of a small number of distinct regions easy to interpret. This is specifically illustrated by simulations from which we can see immediately the superiority of this method on CART. Pre-screening and feature selections methods are also developed to improve the performances and reduce the noise. Software is also available in the R package HOPSLAM (HOpach-Pam Supervised Learning AlgorithM) to make this methodology easily accessible.

Cover page: Topics in Evidence Synthesis