Skip to main content
Open Access Publications from the University of California


A department of the University of California San Francisco School of Medicine; our educational mission is to train students, fellows and faculty in methods for studying disease etiology and prevention in general populations, for evaluating diagnostic tests and treatment efficacy in clinical settings, and for using evidence-based approaches in clinical practice. Our scientific mission is to do outstanding clinical and population-based research in these areas, often in collaboration with other departments and institutions, and to guide use of the findings in clinical practice and public health policies.

Department of Epidemiology and Biostatistics

There are 1728 publications in this collection, published between 1981 and 2021.
Open Access Policy Deposits (1719)

Racial differences in tuberculosis infection in United States communities: the coronary artery risk development in young adults study.

Previously reported associations between race/ethnicity and tuberculosis infection have lacked sufficient adjustment for socioeconomic factors. We analyzed race/ethnicity and self-reported tuberculosis infection data from the Coronary Artery Risk Development in Young Adults (CARDIA) study, a well-characterized cohort of 5115 black and white participants, and found that after adjusting for sociodemographic and clinical factors, black participants were more likely to report tuberculosis infection and/or disease (odds ratio, 2.0; 95% confidence interval, 1.5-2.9).

DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly.

Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.

Meaningful end points and outcomes in men on active surveillance for early-stage prostate cancer.

Purpose of review

Active surveillance is a management strategy for early-stage prostate cancer designed to balance early detection of aggressive disease and overtreatment of indolent disease. We evaluate recently reported outcomes and discuss the potentially most important endpoints for such an approach.

Recent findings

The past 2 years have seen the publication of two trials of watchful waiting versus immediate treatment and updates of multiple active surveillance cohorts for men with early-stage prostate cancer. The watchful waiting trials demonstrated a small potential mortality benefit to immediate treatment when applied to all risk levels (6% absolute difference at 15 years), emphasizing the importance of a risk-adapted strategy. In reported active surveillance cohorts, prostate cancer death and metastasis remain rare events. Intermediate outcomes such as progression to treatment and upgrading/upstaging on final disease appear consistent among cohorts, but must be interpreted with caution when compared with historical controls of immediate treatment because of potential selection bias.


The safety of active surveillance has been reinforced by recent reports. Accumulation of additional data on men with intermediate risk cancer and development and validation of new biomarkers of risk will allow refined and, likely, expanded use of this approach.

1716 more worksshow all
Recent Work (1)

Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression


Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates.


We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women’s Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use.


Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before.


In studies of fibrosis progression, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.