Current drug development, regulatory approval, and clinical practice are heavily relying on clinical trials. While, randomized clinical trials are inevitable to measure true effects of new therapeutics, this paradigm is notably limited in various aspects. One key aspect, arguably most impactful and proximal to patients and clinical care, is the questionable external validity of randomized clinical trials. Highly selective inclusion and exclusion criteria and the stringent treatment protocol followed by participating providers result in an “experimental setting” that does not reflect the clinical care patients would receive in real-world settings. In addition, real-world patients are often characterized as older, sicker, and more fragile compared to participants selected into clinical trials. Therefore, real-world therapeutic performances, including safety and effectiveness, may not be comparable to clinical trials and require further assessments. In addition, ways to optimize therapeutic outcomes in real-world patients, meaning maximizing benefit while minimizing risks, are often lacking. To study real-world therapeutic performances, the exponentially- and ubiquitously-growing real-world electronic health records data (EHR) becomes essential.
EHR data refers to clinical data, such as diagnoses, medications, procedures, and laboratory tests, captured through routine clinical care that any real-world patient would receive at every visit and hospitalization. Enriched with abundant patient-level data, EHR data represents a comprehensive real-world clinical data ideal to study performances of novel therapeutics. Yet, research involving EHR data can be challenging due to its unique characteristics such as the retrospective nature of the data that may inherit biases and require careful considerations. Furthermore, the high-dimensionality and longitudinality add another layer of complexity and may require more sophisticated or advanced methods.
This dissertation focuses on developing novel computational methods to characterize and predict therapeutic outcomes in real-world patients with the hope to ultimately help guide clinical use (practice) of novel therapeutics. A diverse set of methods spanning across traditional statistical approaches, machine learning, rule-based natural language processing, to advanced transformer-based models coupled with transfer learning strategy were explored and developed. The utility of each method was demonstrated in a medical-oriented problem where the outcomes may help improve clinical care, patients’ health outcomes, and public health as a whole.