Search

Article
Peer Reviewed

An integrative method for scoring candidate genes from association studies: application to warfarin dosing

UC San Francisco Previously Published Works (2010)

Background

A key challenge in pharmacogenomics is the identification of genes whose variants contribute to drug response phenotypes, which can include severe adverse effects. Pharmacogenomics GWAS attempt to elucidate genotypes predictive of drug response. However, the size of these studies has severely limited their power and potential application. We propose a novel knowledge integration and SNP aggregation approach for identifying genes impacting drug response. Our SNP aggregation method characterizes the degree to which uncommon alleles of a gene are associated with drug response. We first use pre-existing knowledge sources to rank pharmacogenes by their likelihood to affect drug response. We then define a summary score for each gene based on allele frequencies and train linear and logistic regression classifiers to predict drug response phenotypes.

Results

We applied our method to a published warfarin GWAS data set comprising 181 individuals. We find that our method can increase the power of the GWAS to identify both VKORC1 and CYP2C9 as warfarin pharmacogenes, where the original analysis had only identified VKORC1. Additionally, we find that our method can be used to discriminate between low-dose (AUROC=0.886) and high-dose (AUROC=0.764) responders.

Conclusions

Our method offers a new route for candidate pharmacogene discovery from pharmacogenomics GWAS, and serves as a foundation for future work in methods for predictive pharmacogenomics.

Cover page: An integrative method for scoring candidate genes from association studies: application to warfarin dosing

Article
Peer Reviewed

Cross-Modal Data Programming Enables Rapid Medical Machine Learning.

UC Davis Previously Published Works (2020)

A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work-a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models.

Cover page: Cross-Modal Data Programming Enables Rapid Medical Machine Learning.

Article
Peer Reviewed

Circulating KRAS G12D but not G12V is associated with survival in metastatic pancreatic ductal adenocarcinoma.

UCLA Previously Published Works (2024)

While high circulating tumor DNA (ctDNA) levels are associated with poor survival for multiple cancers, variant-specific differences in the association of ctDNA levels and survival have not been examined. Here we investigate KRAS ctDNA (ctKRAS) variant-specific associations with overall and progression-free survival (OS/PFS) in first-line metastatic pancreatic ductal adenocarcinoma (mPDAC) for patients receiving chemoimmunotherapy (PRINCE, NCT03214250), and an independent cohort receiving standard of care (SOC) chemotherapy. For PRINCE, higher baseline plasma levels are associated with worse OS for ctKRAS G12D (log-rank p = 0.0010) but not G12V (p = 0.7101), even with adjustment for clinical covariates. Early, on-therapy clearance of G12D (p = 0.0002), but not G12V (p = 0.4058), strongly associates with OS for PRINCE. Similar results are obtained for the SOC cohort, and for PFS in both cohorts. These results suggest ctKRAS G12D but not G12V as a promising prognostic biomarker for mPDAC and that G12D clearance could also serve as an early biomarker of response.

Cover page: Circulating KRAS G12D but not G12V is associated with survival in metastatic pancreatic ductal adenocarcinoma.