Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Exploring methods to identify individuals infected with hepatitis C virus in the United States: an application of ensemble learning with national survey data

Abstract

Over 70 million people worldwide are living with chronic hepatitis C virus (HCV) infection. Untreated, HCV infection can progress to cirrhosis, advanced liver disease, and hepatocellular carcinoma. Our improved understanding of HCV transmission, coupled with significant advances in treatment have the potential to dramatically reduce the incidence and prevalence of HCV-related diseases. Based on these advances, the World Health Organization (WHO) has established a goal to eliminate HCV infection by 2030; however, a significant impediment to this goal is the lack of infection awareness, which perpetuates the spread of the virus. By improving the detection of HCV infection, we can connect patients to treatment to reduce its prevalence and curtail transmission to reduce future incidence of infection.

This dissertation reviews the literature on known risk factors for HCV infection in the United States (US) and uses a large, contemporary, publicly available national dataset, the National Health and Nutrition Examination Survey (NHANES), to look for additional risk factors and to build an algorithm to identify individuals with a high probability of HCV infection. NHANES participants are randomly selected from the non-institutionalized and housed US population and screened for HCV RNA, regardless of insurance status or known risk factors, providing meaningful insights into the characteristics associated with HCV infection.

The results of an umbrella review of circumstances associated with an increased prevalence of HCV infection in the US can be found in Chapter two. Risk factors were categorized as behavioral/lifestyle factors, risks associated with a medical condition, risks related to an occupation, or vulnerable populations. These findings can be used to improve outreach, education, and prevention programs, as many of the identified risk factors are present in marginalized groups that may not have access to regular healthcare or may be missed by existing HCV diagnosis and prevention efforts. Chapter three explores the use of ensemble learning methods to identify the features captured by NHANES that have the greatest impact on successful HCV infection prediction. NHANES data include HCV RNA measurements for all participants of the medical examination portion of the survey. With this information, the ensemble learning method Super Learner was used to identify complex patterns of characteristics associated with HCV infection and to identify the characteristics that had the greatest impact on successful HCV infection prediction in the US (ranked variable importance). Using a subset of the NHANES data that would likely be available and accurate in electronic medical records, Chapter 4 examines the development of an HCV prediction algorithm that could be used to prioritize candidates for HCV screening. Overall, these findings contribute to the national effort to increase HCV-infection detection and accelerate progress towards HCV elimination.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View