A department of the University of California San Francisco School of Medicine; our educational mission is to train students, fellows and faculty in methods for studying disease etiology and prevention in general populations, for evaluating diagnostic tests and treatment efficacy in clinical settings, and for using evidence-based approaches in clinical practice. Our scientific mission is to do outstanding clinical and population-based research in these areas, often in collaboration with other departments and institutions, and to guide use of the findings in clinical practice and public health policies.
Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression
Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates.
We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women’s Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use.
Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before.
In studies of fibrosis progression, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.
Racial differences in tuberculosis infection in United States communities: the coronary artery risk development in young adults study.
Previously reported associations between race/ethnicity and tuberculosis infection have lacked sufficient adjustment for socioeconomic factors. We analyzed race/ethnicity and self-reported tuberculosis infection data from the Coronary Artery Risk Development in Young Adults (CARDIA) study, a well-characterized cohort of 5115 black and white participants, and found that after adjusting for sociodemographic and clinical factors, black participants were more likely to report tuberculosis infection and/or disease (odds ratio, 2.0; 95% confidence interval, 1.5-2.9).
DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly.
Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.
Congenital heart disease (CHD) is present in 1% of live births, yet identification of causal mutations remains challenging. We hypothesized that genetic determinants for CHDs may lie in the protein interactomes of transcription factors whose mutations cause CHDs. Defining the interactomes of two transcription factors haplo-insufficient in CHD, GATA4 and TBX5, within human cardiac progenitors, and integrating the results with nearly 9,000 exomes from proband-parent trios revealed an enrichment of de novo missense variants associated with CHD within the interactomes. Scoring variants of interactome members based on residue, gene, and proband features identified likely CHD-causing genes, including the epigenetic reader GLYR1. GLYR1 and GATA4 widely co-occupied and co-activated cardiac developmental genes, and the identified GLYR1 missense variant disrupted interaction with GATA4, impairing in vitro and in vivo function in mice. This integrative proteomic and genetic approach provides a framework for prioritizing and interrogating genetic variants in heart disease.