Clinical Features of Emergency Department Patients from Early COVID-19 Pandemic that Predict SARS-CoV-2 Infection: Machine-learning Approach
- Chou, Eric H.;
- Wang, Chih-Hung;
- Hsieh, Yu-Lin;
- Namazi, Babak;
- Wolfshohl, Jon;
- Bhakta, Toral;
- Tsai, Chu-Lin;
- Lien, Wan-Ching;
- Sankaranarayanan, Ganesh;
- Lee, Chien-Chang;
- Lu, Tsung-Chien
- et al.
Published Web Locationhttps://doi.org/10.5811/westjem.2020.12.49370
Introduction: Within a few months coronavirus disease 2019 (COVID-19) evolved into a pandemic causing millions of cases worldwide, but it remains challenging to diagnose the disease in a timely fashion in the emergency department (ED). In this study we aimed to construct machine-learning (ML) models to predict severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) infection based on the clinical features of patients visiting an ED during the early COVID-19 pandemic.
Methods: We retrospectively collected the data of all patients who received reverse transcriptase polymerase chain reaction (RT-PCR) testing for SARS-CoV-2 at the ED of Baylor Scott & White All Saints Medical Center, Fort Worth, from February 23–May 12, 2020. The variables collected included patient demographics, ED triage data, clinical symptoms, and past medical history. The primary outcome was the confirmed diagnosis of COVID-19 (or SARS-CoV-2 infection) by a positive RT-PCR test result for SARS-CoV-2, and was used as the label for ML tasks. We used univariate analyses for feature selection, and variables with P<0.1 were selected for model construction. Samples were split into training and testing cohorts on a 60:40 ratio chronologically. We tried various ML algorithms to construct the best predictive model, and we evaluated performances with the area under the receiver operating characteristic curve (AUC) in the testing cohort.
Results: A total of 580 ED patients were tested for SARS-CoV-2 during the study periods, and 98 (16.9%) were identified as having the SARS-CoV-2 infection based on the RT-PCR results. Univariate analyses selected 21 features for model construction. We assessed three ML methods for performance: of the three methods, random forest outperformed the others with the best AUC result (0.86), followed by gradient boosting (0.83) and extra trees classifier (0.82).
Conclusion: This study shows that it is feasible to use ML models as an initial screening tool for identifying patients with SARS-CoV-2 infection. Further validation will be necessary to determine how effectively this prediction model can be used prospectively in clinical practice.