Development and External Validation of Clinical Features-based Machine Learning Models for Predicting COVID-19 in the Emergency Department

Introduction Timely diagnosis of patients affected by an emerging infectious disease plays a crucial role in treating patients and avoiding disease spread. In prior research, we developed an approach by using machine learning (ML) algorithms to predict serious acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection based on clinical features of patients visiting an emergency department (ED) during the early coronavirus 2019 (COVID-19) pandemic. In this study, we aimed to externally validate this approach within a distinct ED population. Methods To create our training/validation cohort (model development) we collected data retrospectively from suspected COVID-19 patients at a US ED from February 23–May 12, 2020. Another dataset was collected as an external validation (testing) cohort from an ED in another country from May 12–June 15, 2021. Clinical features including patient demographics and triage information were used to train and test the models. The primary outcome was the confirmed diagnosis of COVID-19, defined as a positive reverse transcription polymerase chain reaction test result for SARS-CoV-2. We employed three different ML algorithms, including gradient boosting, random forest, and extra trees classifiers, to construct the predictive model. The predictive performances were evaluated with the area under the receiver operating characteristic curve (AUC) in the testing cohort. Results In total, 580 and 946 ED patients were included in the training and testing cohorts, respectively. Of them, 98 (16.9%) and 180 (19.0%) were diagnosed with COVID-19. All the constructed ML models showed acceptable discrimination, as indicated by the AUC. Among them, random forest (0.785, 95% confidence interval [CI] 0.747–0.822) performed better than gradient boosting (0.774, 95% CI 0.739–0.811) and extra trees classifier (0.72, 95% CI 0.677–0.762). There was no significant difference between the constructed models. Conclusion Our study validates the use of ML for predicting COVID-19 in the ED and demonstrates its potential for predicting emerging infectious diseases based on models built by clinical features with temporal and spatial heterogeneity. This approach holds promise for scenarios where effective diagnostic tools for an emerging infectious disease may be lacking in the future.


INTRODUCTION
The global impact of the coronavirus 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has been far reaching. 1,2Its clinical manifestations vary from mild to severe illness and even death, with a subset of those infected remaining asymptomatic. 3The worldwide crisis has resulted in a significant loss of life and deeply affected global health.Effectively controlling disease transmission requires early recognition and quarantine measures; however, this was difficult before the identification of the causal pathogen and the advent of the molecular diagnostic tool during the early phase of the pandemic.
Taiwan had success in preventing COVID-19 outbreaks until mid-May 2021 when community transmission emerged and cases surged to over 3,100 in a week. 4As of September 20, 2022, Taiwan has reported over six million cases and over 5,000 deaths.The sudden surge in cases, coupled with shortages of vaccine and testing, triggered a surge of patients seeking care in the emergency department (ED).6][7][8] Tools to reduce workload and streamline processes for healthcare personnel are crucial to ease their mental health burden during a pandemic.
When facing an emerging infectious disease such as COVID-19, it is crucial to identify patients with the risk of infection and thus avoid spreading the disease into the community.4][15] However, such data may not be readily available during ED triage, hindering early risk stratification.Moreover, any additional diagnostic tests further pose risk to healthcare personnel and require transport and movement of the patient, which should be minimized from an infection prevention and control perspective. 16Hence, a persistent challenge remained: how to provide an accurate prediction of SARS-CoV-2 infection in suspected patients with limited modalities of data.
By employing clinical features ascertained during initial ED triage, we previously constructed ML models to create a preliminary screening mechanism that would effectively identify individuals with SARS-CoV-2 infection. 17Based on the framework established in that earlier study, we sought external validation of our proposed methodology in the setting of an ED in a tertiary medical facility in Taiwan.Of note, this ED consists of a distinctive population of patients with dissimilar demographic characteristics (in contrast to the cohort used for the original model development).Our primary goal was to validate the feasibility of our approach, to expedite the process of risk stratification pertinent to emerging infectious diseases within the ED.

Population Health Research Capsule
What do we already know about this issue?Timely diagnosis of an emerging infectious disease like COVID-19 is crucial for treatment and prevention.

Study Design and Setting
We previously conducted a retrospective cohort study by retrieving electronic health record (EHR) data of suspected COVID-19 patients from February 23-May 12, 2020 at the ED of Baylor Scott & White All Saints Medical Center (BAS) in Fort Worth, TX, a 574-bed, university-affiliated tertiary teaching hospital with ≈50,000 ED visits annually.In the current study, we retrospectively collected another set of patient records from suspected adult COVID-19 cases from 12 May 12-June 15, 2021 at the ED of National Taiwan University Hospital (NTUH), Taipei in Taiwan, a 2,400-bed university-affiliated tertiary teaching hospital with a daily census of ≈8,000 outpatients and 300 emergency visits.This study was approved by the Baylor Scott & White Research Institute Insitutional Review Board (No.: 344143), and by NTUH (No. 202009106RIPA), which waived the requirement for informed consent.

Study Population
In the retrospective study that served as the model development cohort, we identified all patients who presented at the ED of the study hospital with suspected COVID-19 and underwent testing for SARS-CoV-2 through the reverse transcription polymerase chain reaction (RT-PCR) method.In the current study, we also retrospectively collected clinical data for all adult (≥18 years) patients who were tested for SARS-CoV-2 using RT-PCR for suspected COVID-19 as the model's external validation cohort.The decision to perform RT-PCR tests was left to the discretion of the emergency physician or physician assistant of each patient.

Data Collection and Outcome Measures
Patient demographics, past medical histories (PMH), vital signs recorded at ED triage, and presenting symptoms were retrieved from the EHR.The comprehensive process of data collection was elaborated in our previous study. 17A positive RT-PCR for SARS-CoV-2 confirms the diagnosis of COVID-19 (or SARS-CoV-2 infection) and was defined as the primary outcome in both cohorts.We used the model development cohort as the training/validation set to construct the ML models, and the external validation cohort was used as the testing set to evaluate the models' performance.
Data were entered, processed, and analyzed with SPSS Statistics for Windows version 27.0, (IBM Corp, Armonk, NY).We performed the assessment of data normality using the Shapiro-Wilk test for continuous variables.The results were subsequently reported as either the mean with standard deviation or the median with interquartile range.Categorical variables were denoted as proportions or percentages.To identify pertinent features, we used univariate analyses to discern disparities in outcomes among distinct groups.These analyses encompassed statistical methods such as the Student t-test, chi-squared test, Fisher exact test, or Mann-Whitney U test depending on the distribution.We subsequently selected variables with P < 0.1 on the training/validation set as the input features for the development of the ML models.We used K-fold cross-validation to train the model by setting k from 7 to 10, and the selection of k was based on the best area under the receiver operating characteristic curve (AUC) performance on the test set.
In our preceding study, we employed three distinct ML algorithms-specifically, gradient boosting, random forest, and extra trees classifiers-to construct prediction models for forecasting SARS-CoV-2 infection. 17In the current study, we validated this approach in another ED population, wherein we replicated the predictive modeling methodology through the employment of the identical ML algorithms used in our prior research.These ML algorithms represent sophisticated ensemble techniques that amalgamate multiple individual models to enhance predictive accuracy and robustness for classification tasks.To deal with the intricate challenge posed by imbalanced data within our cohorts, we applied the synthetic minority oversampling technique (SMOTE), after technique to oversample the minority class, augmenting it by a factor of 0.6 times relative to the magnitude of the majority class.We undertook this measure to establish a more balanced representation, so that the ratio of COVID-19 positive to negative was 0.6 to 1.0 during the training phase.Subsequently, we assessed the performance metrics exhibited by the developed ML models used in the testing set.
To evaluate the performance of the models we built, we used different performance metrics, including the area under the receiver operating characteristic curve (AUC), accuracy, F1-score, precision (positive predictive value [PPV], recall (sensitivity), specificity, negative predictive value (NPV), and area under the precision-recall curve (AUPRC).We used the DeLong test for AUC and Boyd test for AUPRC for pairwise comparisons of the models' performances.All ML analyses were performed using Jupyter Notebook 6.0.3 (Project Jupyter) with Python 3.8.3installed and the package scikitlearn 0.23.1 (Python Software Foundation).

RESULTS
The model development cohort (training/validation set) consisted of 580 cases from patients who presented to BAS, while the model validation cohort (testing set) comprised 946 cases from patients who presented to NTUH.Among them, 98 (16.9%) and 180 (19.0%), respectively, were diagnosed with COVID-19.The characteristics of the study population are shown in Table 1.The characteristics and univariate analyses of variables (features) between patients with COVID-19 are summarized in Table 2, for the training/ validation and testing sets, respectively.
We selected 26 features by setting the P-value threshold of less than 0.1 from the model development History of solid organ transplant 0.349-0.499).The differences between each ML model in terms of AUC and AUPRC are not significant.
In evaluating additional performance metrics, all our ML models performed well in terms of accuracy, specificity, and NPV.Nevertheless, the performances of the F1 score, sensitivity, and PPV are suboptimal.Feature importance (presented as a heat map computed and ordered by median normalized importance across all models) is shown in

DISCUSSION The Main Findings of This Study
In our previous study, we constructed ML models designed to predict COVID-19 based on the clinical features documented during ED triage within a tertiary teaching hospital in the US during the first wave of the COVID-19 pandemic. 17In the current study, our objective was to validate this approach externally in another ED population of a medical center located elsewhere in the world.By collecting a cohort of 946 consecutive ED patients visiting NTUH during the second wave of the COVID-19 pandemic in Taiwan, we found that the random forest model emerged as the best performer with acceptable discrimination performance in terms of AUC and AUPRC.However, the remaining two models also achieved close results without significant differences, and all models performed well in accuracy, specificity, and NPV.With only demographics, vital signs at triage, clinical symptoms, contact history and PMH collected at ED triage, this approach exemplifies the feasibility of predicting COVID-19 at triage even before patients go into the ED.The predictive results offer valuable assistance to emergency physicians in identifying patients at risk of the disease.This enables such patients to undergo further examination, testing, isolation, and appropriate treatment measures.

Comparison with Previous Studies
Since the inception of the disease, ML algorithms have been extensively applied in fighting COVID-19. 18While certain applications targeted COVID-19 diagnosis as the primary outcome, others focused on morbidity and mortality for patients with confirmed SARS-CoV-2 infection. 10Some investigations focused on the ED setting, while others focused on the general population. 19,20Moreover, some reports used chest radiographs or computed tomography of the lung to exploit imaging characteristics to differentiate pneumonia caused by SARS-CoV-2 from that with other causes, 13,14,15 while others used routine blood test results. 9,11,12Meanwhile, certain reports employed clinical data-including patient demographics, symptoms, vital signs, and PMH-as the input of prediction models similar to our study design. 21Furthermore, there were studies that combined multiple modalities from the above-mentioned studies. 22Although the source and size of the studies reported in the literature varied, our current study is the only one that uses only the clinical features collected from ED triage and provides promising external validation results.
In comparison to this study, our previous study yielded a stronger result with an AUC of 0.86, whereas the best-performing model in this study achieved only an AUC of 0.785.The decline in performance was anticipated since the test dataset in the previous study came from the same population as the training dataset, whereas in this study the two datasets came from different populations with different patient demographics.Additionally, certain features used in our previous study that rely on the model development cohort were not employed in this validation study due to different healthcare systems and ethnicity distribution in different populations.Nonetheless (with the exception of the study by Zoabi et al), the models we built in the current study showed competitive or even better performance in comparison to other studies that relied on clinical features for their models [19][20][21][22] (Supplementary Table 2).

Feasibility for Clinical Application
This study achieved acceptable predictive performances with metrics exceeding 0.7 in terms of AUC, specificity, and NPV, making these ML models a suitable screening tool to rule in patients in need of further attention.With the information readily accessible from the EHR during ED triage, our model may assist emergency clinicians to segregate patients with a high likelihood of COVID-19 infection from those at lower risk.By doing so, the risk of cross-infection may be minimized, and high-risk patients may receive appropriate care promptly.If effectively integrated into the system as an automated alert system during the initial ED encounter, it could exert substantial impact on clinical workflows while simultaneously reduce disease transmission and cross-infection in the ED setting.However, precision must be exercised to ensure the alerts provided by the predictive model are pertinent and timely, without disrupting the existing workflow. 23t present, a confirmed diagnosis of COVID-19 is made by direct detection of SARS-CoV-2 RNA using RT-PCR testing; however, it may take up to eight hours to obtain the test result after the sample is delivered. 24Although several rapid antigen tests (RAT) have been developed as screening tools, their accuracy is strongly affected by the pretest probability and is less effective in the asymptomatic population. 25Moreover, many regions worldwide still lack the capacity for RAT kits.As the COVID-19 pandemic persists and new variants emerge, a reliable ML prediction model could function as a rapid screening tool to quickly differentiate the suspicious cases from other patients and facilitate infection control even before patients enter the ED.Additionally, this study also provides a proof of concept for ML models capable of predicting an emerging infectious disease of an unknown pathogen based on models built by clinical features without the necessity of pathogen-specific tests.When faced with an emerging novel infectious disease in the future, this approach would be extremely valuable, particularly in situations where a dedicated diagnostic tool

Figure 1 .
Figure 1.Results of the machine learning models on the test cohort.(A), Receiver operating characteristic (ROC) curves and the comparison of area under curve (AUC); (B), precision-recall curve and the comparison of area under the precision-recall curve (AUPRC) for three different machine learning models.ET, extra trees; RF, random forest; GB, gradient boosting.

Figure 2 .
Figure 2. The heat map of computed features ordered by median normalized importance across all models.SBP, systolic blood pressure; BMI, body mass index; SPO 2 , oxygen saturation; Hx, history; COPD, chronic obstructive pulmonary disorder; CVA, cerebrovascular accident; AMS, altered mental state; ET, extra trees; RF, random forest; GB, gradient boosting.

Figure 2 .
Figure2.The 9 most important features were temperature, systolic blood pressure, weight, body mass index, any co-morbidities, age, oxygen saturation, respiratory rate, and contact history.

Table 1 .
Characteristics of the study population.Development of ML Models for Predicting COVID-19 in the ED Tay et al.
(Continued on next page) Western Journal of Emergency Medicine Volume 25, No. 1: January 2024 70

Table 1 .
Continued.Tay et al.Development of ML Models for Predicting COVID-19 in the ED (Continued on next page) Volume 25, No. 1: January 2024 Western Journal of Emergency Medicine 71

Table 2 .
Characteristics and univariate analyses of variables (features) between patients with or without COVID-19 on the training and testing cohorts.Development of ML Models for Predicting COVID-19 in the ED Tay et al.
(Continued on next page) Western Journal of Emergency Medicine Volume 25, No. 1: January 2024 72
(Continued on next page) Volume 25, No. 1: January 2024 Western Journal of Emergency Medicine 73

Table 3 .
Performance metrics of 7-fold cross validation for different machine learning algorithms on the testing set.AUC, area under the receiver operating characteristic curve; AUPRC, area under the precision recall curve; PPV, positive predictive value; NPV, negative predictive value.