- Karimzadeh, Mehran;
- Momen-Roknabadi, Amir;
- Cavazos, Taylor;
- Fang, Yuqi;
- Chen, Nae-Chyun;
- Multhaup, Michael;
- Yen, Jennifer;
- Ku, Jeremy;
- Wang, Jieyang;
- Zhao, Xuan;
- Murzynowski, Philip;
- Wang, Kathleen;
- Hanna, Rose;
- Huang, Alice;
- Corti, Diana;
- Nguyen, Dang;
- Lam, Ti;
- Kilinc, Seda;
- Arensdorf, Patrick;
- Chau, Kimberly;
- Hartwig, Anna;
- Fish, Lisa;
- Li, Helen;
- Behsaz, Babak;
- Elemento, Olivier;
- Zou, James;
- Hormozdiari, Fereydoun;
- Alipanahi, Babak;
- Goodarzi, Hani
Liquid biopsies have the potential to revolutionize cancer care through non-invasive early detection of tumors. Developing a robust liquid biopsy test requires collecting high-dimensional data from a large number of blood samples across heterogeneous groups of patients. We propose that the generative capability of variational auto-encoders enables learning a robust and generalizable signature of blood-based biomarkers. In this study, we analyze orphan non-coding RNAs (oncRNAs) from serum samples of 1050 individuals diagnosed with non-small cell lung cancer (NSCLC) at various stages, as well as sex-, age-, and BMI-matched controls. We demonstrate that our multi-task generative AI model, Orion, surpasses commonly used methods in both overall performance and generalizability to held-out datasets. Orion achieves an overall sensitivity of 94% (95% CI: 87%-98%) at 87% (95% CI: 81%-93%) specificity for cancer detection across all stages, outperforming the sensitivity of other methods on held-out validation datasets by more than ~ 30%.