Computational methodologies for studying polyclonal immune responses with PhIP-Seq
Skip to main content
eScholarship
Open Access Publications from the University of California

UCSF

UC San Francisco Electronic Theses and Dissertations bannerUCSF

Computational methodologies for studying polyclonal immune responses with PhIP-Seq

No data is associated with this publication.
Abstract

The humoral adaptive immune system is designed to identify and eliminate foreign bodies and aberrantcells, primarily through the production of antibodies that specifically bind to its targets. While playing a major role in disease and in health, this system is highly complex and has been difficult to study at scale. Recently, phage immunoprecipitation sequencing (PhIP-Seq) has emerged as a powerful tool for studying humoral immunity. Thanks to advancements in oligonucleotide synthesis and next-generation sequencing, PhIP-Seq uses large libraries of long (50+ amino-acid) peptides that span massive search spaces, such as the entire isoform-inclusive human proteome, to profile the humoral immune response and antibody repertoire. However, given its recent development and barriers to adoption, there has been a dearth of computational methodologies and data infrastructure. To that end, we developed PhageDB, an integrative end-to-end platform for PhIP-Seq data storage, processing, and analysis. The platform backbone is a MySQL database with redundant backup accessible through a php-based web interface, with underlying scripts written in python. The platform features various novel analysis methodologies we have developed, and have been applied to study the humoral immune response to numerous disease contexts, particularly autoimmune disorders, infectious diseases, and cancer. In particular, we profiled non-small cell lung cancer (NSCLC), one of the most common and deadliest cancers in the world, developing a machine learning-based classifier with samples from primarily early-stage NSCLC patients and healthy controls. The classifier performed well in cross validations (average ROC-AUC = 0.94) and remained robust when applied to a separate blinded and independently analyzed validation cohort of 134 NSCLC patients and 96 healthy controls (ROC-AUC = 0.84). Targets identified by the model have been previously linked to cancer, and model parameterization suggests that the signal can be limited to thousands of input features, but requires sufficiently large training sizes. Together, these findings suggest the existence of a measurable and accessible autoreactive humoral profile associated with early-stage lung cancer, while demonstrating the potential for serum-based early detection of cancer.

Main Content

This item is under embargo until June 23, 2025.