Using Machine Learning to Elucidate Disease Heterogeneity and Improve Prognostication in Patients with COPD
- Yuan, Nancy Fang
- Advisor(s): Gaasterland, Theresa
Abstract
Chronic obstructive pulmonary disease (COPD) is a heterogeneous syndrome, with phenotypic manifestations that tend to be distributed along a continuum. Unsupervised machine learning based on broad selection of imaging and clinical phenotypes may be used to identify primary variables that define disease axes and stratify patients with COPD.
To identify primary variables driving COPD heterogeneity using principal component analysis (PCA), and to define disease axes and assess the prognostic value of these axes across three outcomes: progression, exacerbation, and mortality.We included 7331 patients between 39 and 85 years, of which 40.3% are black and 45.8% are female smokers with a mean of 44.6 pack years from the COPDGene Phase 1 cohort (2008-2011) in our analysis. Out of a total of 916 phenotypes, 147 continuous clinical, spirometric, and CT features were selected. For each component (PC), we computed a principal component score (PCS) based on feature weights. We used PCS distributions to define disease axes along which we divided the patients into quartiles. To assess the prognostic value of these axes, we applied logistic regression analyses to estimate 5-year (n=4159) and 10-year (n=1487) odds of progression. Cox regression and Kaplan-Meier analyses were performed to estimate 5-year and 10-year risk of exacerbation (n=6532) and all-cause mortality (n=7331). The first PC, accounting for 43.7% of variance, was defined by CT measures of air trapping and emphysema. The second PC, accounting for 13.7% of variance, was defined by spirometric and CT measures of vital capacity and lung volume. The third PC, accounting for 7.9% of the variance, was defined by CT measures of lung mass, airway thickening, and body habitus. Stratification of patients across each disease axis revealed up to 3.2-fold [2.4, 4.3] greater odds of 5-year progression, 5.4-fold [4.6, 6.3] greater risk of 5-year exacerbation, and 5.0-fold [4.2, 6.0] greater risk of 10-year mortality between the highest and lowest quartiles. Unsupervised learning analysis of the COPDGene cohort reveals CT measurements may bolster patient stratification along the continuum of COPD phenotypes. Each of the disease axes also individually demonstrate prognostic potential, predictive of future FEV1 decline, exacerbation, and mortality.
To define clinically meaningful stages of disease using CT imaging for patients with COPD, we developed a deep learning-based algorithm to stage severity of COPD through quantification of emphysema and air trapping on CT and assessed the ability of proposed stages to prognosticate 5-year progression and mortality.In this retrospective study, an algorithm using co-registration and lung segmentation was developed in-house to automate quantification of emphysema and air trapping from inspiratory and expiratory CT images. The algorithm was then tested on a separate group of 8951 patients from the COPDGene study (2007-2017). With measurements of emphysema and air trapping, bivariable thresholds were determined to define CT stages of severity (mild, moderate, severe, and very severe), and evaluated for their ability to prognosticate disease progression and mortality using logistic regression and Cox regression. Based on CT stages, odds of disease progression were greatest among patients with very severe disease (odds ratio [OR], 2.67 [95% CI: 2.02, 3.53; P<.001), and elevated in patients with moderate disease (1.50 [1.22, 1.84]; P = .001). Hazard ratio of mortality for very severe disease on CT was 2.23 times normal ([1.93, 2.58]; P<.001). When combined with GOLD staging, patients with GOLD 2 disease had the greatest odds of disease progression when CT stage was severe (4.48 [3.18, 6.31]; P<.001) or very severe (4.72 [3.13, 7.13]; P<.001). Automated CT algorithms can facilitate staging of COPD severity, which has comparable diagnostic performance to spirometric GOLD staging, and provides further prognostic value when used in conjunction with GOLD stage. CT-based severity stratification of patients with COPD can prognosticate disease progression and mortality and may allow for improved diagnosis and management.
The prevalence of COVID-19 has placed undue burden on the healthcare system, signaling a critical need to develop innovative machine learning (ML) strategies to improve triage and care for patients who are hospitalized with COVID-19. Recent developments in artificial intelligence have shown that deep learning algorithms such as convolutional neural networks (CNN) are effective in automating various tasks in medical imaging such as pneumonia detection [1]. The challenge remains in implementing these algorithms efficiently in the clinic where there may be insufficient hardware to support CNN with hundreds of hidden layers and millions of parameters [2]. To address this challenge, we developed a shallow 3-layer CNN classifier, SimplePNUnet, to detect the presence of pneumonia using a subset of 2000 frontal chest X-rays from the publicly accessible dataset released as part of the 2018 RSNA Pneumonia Detection Challenge. We fine-tuned model hyperparameters using Bayesian optimization. Additionally, we assessed the robustness of SimplePNUnet across different downsampling techniques and batch sizes. Performance of the algorithm was evaluated using the area under receiver operating characteristic curve (AUROC). The performance of the best model across the training (n=1000), validation (n=500), and test (n=500) sets was comparable (AUC, 0.870 vs 0.812 vs 0.834, respectively) and within range of deeper U-Net based pneumonia detection algorithms (AUC, 0.80-0.90). SimplePNUnet achieved good performance on an external validation set (AUC, 0.712), demonstrating its generalizability to evaluate other COVID-19 patient cohorts.