Western Journal of Emergency Medicine: Integrating Emergency Care with Population Health Diagnostic and Prognostic Value of Chest Radiographs for COVID-19 at Presentation

the diagnostic value Methods: We retrospectively identified consecutive reverse transcription polymerase reaction-confirmed COVID-19 patients (n = 104, 75% men) and patients (n = 75, 51% men) with repeated negative severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) tests. Two radiologists blindly and independently reviewed the CXRs, documented findings, assigned radiographic assessment of lung edema (RALE) scores, and predicted the patients’ COVID-19 status. We calculated interobserver reliability. The score use for diagnosis and prognosis of COVID-19 was evaluated with the area under the receiver operating characteristic curve. Results:


INTRODUCTION
Coronavirus disease 2019 (COVID-19) is spreading globally. 1 The World Health Organization (WHO) declared COVID-19 a pandemic on March 11, 2020. 2 The most common presenting clinical symptoms are fever, cough, dyspnea, myalgia, and fatigue. [3][4][5] Older age and medical comorbidities are linked to more severe disease. 4,[6][7][8] Men are over-represented among COVID-19 patients. 3,4,6,7 Although the radiological literature mainly focuses on computed tomography (CT) findings, 9,10 many patients are imaged solely with chest radiography 10,11 primarily as an adjunct to reverse transcription polymerase chain reaction (RT-PCR) but in some scenarios as a triage tool, 12,13 especially in resourceconstrained environments where the supply of laboratory PCR kits cannot meet the demand. Although there are nonspecific respiratory symptoms commonly observed in COVID-19 patients at presentation, some patients with COVID-19 do not present with these classic clinical manifestations, which further complicates triage and diagnosis. 4 The chest radiograph (CXR) was reported as having a sensitivity of 69% for COVID-19 in one study of 64 patients. 9 In that study, the common findings were bilateral peripheral opacities with a predilection to the lower lung zones. Opacities increased throughout the illness, with a peak in severity at 10-12 days after symptom onset; this was shown by documenting lung opacities using a simplified radiographic assessment of lung edema (RALE) score. 9,14 When the Fleischner Society consensus statement was created, which specified that chest radiography has little value early in the course of the disease, there were limited data available on the accuracy of chest radiography for the diagnosis of COVID-19. 13 Data on the strengths and weaknesses of chest radiography for the diagnosis of COVID-19 are important, as CXRs are the most commonly used triage imaging tool in any patient presenting with respiratory symptoms. 12 This is especially important because experts suggest that the second wave of coronavirus is likely to be even more devastating. 15 Our aim was to assess the diagnostic accuracy and reliability of CXRs in patients suspected of having COVID-19 at presentation to the emergency department (ED) and to assess the prognostic value of the RALE score in patients with COVID-19.

MATERIALS AND METHODS Patients and Data Source
This retrospective study was approved by our institutional review board, and informed consent was waived. We identified our study population by extracting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RT-PCR test results (positive or negative) of nasopharyngeal swabs from all consecutive patients older than 18 years analyzed at our hospital's laboratory from the ED from March 6-31, 2020, who had a CXR at presentation (within 24 hours of the first RT-PCR). We extracted data by a database search (query) using the MDClone platform (MDClone Ltd, Be'er Sheva, Israel), a big data system for healthcare. We were granted access to the raw data in order to validate the quality and reliability of the information in the database source underlying the study.
The patients were then divided into two groups: those who had COVID-19 and those who did not. The former group comprised patients who had a positive RT-PCR test. The latter, control group comprised patients who had a negative RT-PCR result on at least two separate occasions, more than 24 hours apart (without a previous positive test result). This methodology is similar to that of previously published studies, 16 as we tried to avoid the imperfect gold standard bias. We excluded patients who underwent SARS-CoV-2 testing due to an abnormal CXR and not due to clinical suspicion (n = 1 positive, n = 3 negative) based on the patients' electronic health records (EHR) (Figure 1) to avoid partial verification bias (referral bias). 17 The patients' EHRs were reviewed to obtain demographics Western Journal of Emergency Medicine

Kerpel et al.
Diagnostic and Prognostic Value of Chest Radiographs for COVID-19 at Presentation and clinical data. The primary outcomes were intensive care unit (ICU) hospitalization, intubation, and mortality. COVID-19 severity was classified as severe or non-severe based on respiratory distress (≥30 breaths per minute) or oxygen saturation ≤93% on room air. 18 Although lung opacities are included in some published severity criteria, we did not use CXR findings to define severity to avoid incorporation bias. 17 The data cutoff date was April 21, 2020. We extracted the overall number of ED visits at our hospital during the study period using the MDClone platform database search. Overall COVID-19 new cases in Israel for the study period (26 days), and for an equal time span before and after the study period, were extracted from Israel's Ministry of Health website. 19

Image Analysis
Two radiologists (EMM, a thoracic radiologist with 28 years of experience, and SA, an oncology imaging radiologist with 40 years of experience) independently reviewed all CXRs using a communication system search (PACS), Carestream, PACS Vue v12.1.5 (Carestream Health, Inc, Rochester, NY), while blinded to the RT-PCR results and clinical data. The CXRs of COVID-19 patients and the control patients were in random order. Both readers recorded pulmonary opacity characteristics, including their distribution (peripheral, perihilar or diffuse), zonal predominance (upper, lower, or equal), and laterality (bilateral or unilateral). Pleural effusion presence was recorded. Disagreements between reader 1 (R1) and reader 2 (R2) regarding the categorization of a pleural effusion as definite or questionable were resolved by an independent and blinded third reader (EK, a cardiothoracic radiologist with 21 years of experience). R1 and R2 calculated the RALE scores 14 ( Figure  2). The RALE score, which is used to quantitate lung opacities, 14 is calculated by dividing each radiograph into quadrants and multiplying the extent (0 = no involvement, 1 = <25%, 2 = 25-50%, 3 = 50-75%, 4 = >75%) by the density (1 = hazy, 2 = moderate, 3 = dense) for each quadrant and then summing them (maximum score = 48). 14 For the purposes of our study, the following density definitions were used: hazy, ranging from barely noticeable opacities to mild or veiling opacities, through which the lung vessels can be clearly seen; moderate, in which opacities are identified, but the blood vessels are still visible; and dense, in which consolidation is apparent, and the blood vessels are not visible. For RALE scoring, we excluded CXRs with one of the following overshadowing radiopaque abnormalities: pleural effusion; pleural plaques; and pulmonary nodules or masses, whether due to lung cancer or metastatic disease. Finally, the readers gave their expert opinion regarding patient COVID-19 status based on imaging alone. All previous imaging tests were available to the readers for comparison, and any changes were recorded.

Statistical Analysis
To evaluate the sensitivity and specificity of categorical variables to discriminate between patients with and without COVID-19, assuming sensitivity and specificity of 80% and a 95% confidence interval (CI) of 0.2, 140 patients were needed. To evaluate the use of RALE for determining COVID-19 diagnosis using the area under the receiver operating characteristic curve (AUC), assuming an area of 0.8 with a 95% CI width of 0.2 and an equal number of participants with and without COVID-19, 78 participants were needed. We assumed that the mean RALE score for patients without poor prognosis was 2, with a mean score of 10 for patients with poor outcomes. We assumed that the standard deviation of the RALE score was 8 (range 0-48, divided by six). Using a significance level of 5% and power of 80%, and assuming a proportion of patients having poor outcomes to be 20%, a total of 53 patients were needed.
We evaluated continuous variables for normal distributions using histograms. Variables that were close to being normally distributed are reported as the means and standard deviations (SD), while skewed variables are reported as the medians and interquartile ranges. Categorical variables are reported as frequencies and percentages. We used independent samples t-tests and Mann-Whitney tests to compare normally distributed variables and skewed variables between groups, respectively. Chi-square tests and Fisher's exact tests were applied to compare categorical variables between patients with positive and negative tests. The kappa statistic was used to evaluate the agreement between readers 20 and was interpreted according to Landis and Koch. 21 When a kappa of 0.4 was reached, accuracy was evaluated. Diagnostic accuracy parameters were calculated by crosstabulation and included the following: sensitivity, specificity, and positive (LR+) and negative (LR-) likelihood ratios. We used the intraclass correlation coefficient (ICC) to evaluate the agreement of the two readers with regard to the RALE score. 22 The AUC 23 was used to evaluate the ability of the RALE score to discriminate between COVID-19 and control patients and between poor and favorable outcomes in COVID-19 patients. The discriminatory ability was also evaluated in patients who presented at early (0-2 days), intermediate (3-5 days), and late (≥6 days) time points from symptom onset. For prognostic ability, we used a RALE score cutoff threshold of 5. All statistical tests were two-sided, and p<0.05 was considered statistically significant. For statistical analyses, we used SPSS software (IBM SPSS Statistics for Windows, IBM Corp., Armonk, NY).

Patient Characteristics
During the study period, 105 patients had positive RT-PCR results and had a CXR, and 78 patients had repeated negative results and had a CXR. After excluding patients who had the RT-PCR ordered due to an abnormal CXR (n = 1 COVID-19 patient, n = 3 control patients), our study group included 104 COVID-19 patients (men 78/104, 75%, mean age 57.0, SD 15.7 years) and 75 control patients (men 38/75, 51%, mean age 65.6, SD 21.4 years) ( Figure 1). Table 1 shows patient characteristics and outcomes with a comparison of COVID-19 to control and non-severe to severe COVID-19 patients.
The overall number of ED visits at our hospital during the study period was 8025 (all causes). The number of new cases of COVID-19 in Israel during the study period (26 days) was 5699. The number for the period immediately preceding was 17. The number for the period immediately ensuing was 9723. These numbers show that our study took place at the beginning of the first wave of COVID-19 in Israel.

Radiographic Findings
The identification of any opacity on CXRs had a moderate interobserver agreement (kappa = 0.408). When assuming that any parenchymal lung opacity could represent COVID-19 pneumonia, the diagnostic accuracy for the diagnosis of COVID-19 for both readers was sensitivity (R1-87%; R2-69%) and specificity (R1-25%; R2-27%), and both LR+ and LR-showed the poor diagnostic performance of CXRs for COVID-19, as most crossed or included 1 ( pleural effusion had almost perfect interobserver agreement (kappa = 0.833). The accuracy parameters of the presence of a pleural effusion for the diagnosis of COVID-19 were as follows: sensitivity (R1 and R2 -0.01%), specificity (R1-81%; R2-77%), and very low positive likelihood ratio (LR+) (R1-0.05; R-0.04); thus, the presence of definite pleural effusion at presentation makes the diagnosis of COVID-19 very unlikely (see Table 2). With regard to RALE scoring, 103 CXRs were available in the COVID-19 group after excluding one CXR due to pleural effusion, and 55 CXRs were available in the control group after excluding CXRs with the following overshadowing radiopaque abnormalities: pleural effusion (n = 17); lung cancer (n = 1); multiple metastases (n = 1); and calcified pleural plaques (n =1) (Figure 1). The RALE score interobserver reliability was moderate to good, with an ICC of 0.745 (0.665 -0.806, p<0.001). See Table 3 for the AUC assessment summary. The AUC for all patients (overall) showed no significant difference from sheer chance (R1-p = 0.010; R2-0.865). The evaluation of the discriminatory ability of the RALE score in patients who presented early (0-2 days) showed an inverse correlation with COVID-19 diagnosis. Simply put, in patients presenting within 0-2 days of symptom onset who were clinically suspected of having COVID-19, pulmonary opacities were more likely to be due to a diagnosis other than COVID-19. For patients presenting within three to five days from symptom onset, only R1 achieved statistical significance, while for patients presenting more than six days from symptom onset, both readers reached significant discrimination ability. Thus, for patients presenting later after symptom onset, especially from day six, the higher the RALE score, the more likely a diagnosis of COVID-19. An example is seen in Figure 3, showing the sensitivity of the RALE score with a threshold of 5 for the diagnosis of COVID-19 increasing as the patients arrive later in the disease course. See Figure 4 for CXR examples of patients presenting at different timeframes from symptom onset.
When the RALE score was evaluated as a prognostic indicator within the COVID-19 patient group, both readers had statistically significant discriminatory accuracy for severe disease and poor outcomes (Table 3).
When a RALE score of 5 was used as a threshold for severe disease and for poor outcome, sensitivity was moderate to good, and specificity was moderate. However, LRs were encouraging, as LR+ ranged from 2.21 to 2.59 and LR-ranged from 0.10 to 0.45 (supplemental table). Hence, a RALE score <5 in COVID-19 patients at presentation substantially reduces the odds of having severe COVID-19 or poor outcome (intensive care unit hospitalization, intubation, or death), whereas a RALE score ≥5 substantially increases those odds.

DISCUSSION
In this study we assessed the diagnostic value of the initial CXR for diagnosing COVID-19 in patients clinically suspected of having COVID-19, as well as the prognostic value of this CXR in COVID-19 patients. The study took place in a single hospital in Israel at the beginning of the COVID-19 pandemic first wave. Our study showed that the reliability of radiographs is only moderate for any opacity and moderate to good for the RALE score. Overall, chest radiography was found not to be a valid diagnostic tool for COVID-19. However, the diagnosis of COVID-19 pneumonia by CXRs reached significant diagnostic accuracy when performed at least six days after symptom onset. For patients presenting early (0-2 days from symptom onset), a normal or near-normal CXR is more likely to be seen in a patient with COVID-19, although opacities early in the disease course do not completely rule out this condition. The presence of a definite pleural effusion indicates that the diagnosis is unlikely to be COVID-19. More extensive lung opacities are associated with poor outcome in COVID-19 patients.
Previous COVID-19 studies mainly concentrated on computed tomography (CT) findings and indicated that Western Journal of Emergency Medicine

Kerpel et al.
Diagnostic and Prognostic Value of Chest Radiographs for COVID-19 at Presentation is in contrast to previous studies that did not have a control group 9,11 and were only able to assess sensitivity. Moreover, LRs showed the CXR is ineffective in the ED setting as it failed to meaningfully change the estimation of disease probability from pretest to posttest. This, at the very least, raises doubts about the utility of the CXR as a triage tool. It is perhaps not surprising that the quantification of pulmonary opacities, as performed in our study with the RALE score, was not useful for assessing the entire cohort when trying to distinguish between patients with and without COVID-19, but when interpreted in the context of time from symptom onset, the accuracy improved.  opacities are usually bilateral, with a peripheral distribution and lower zones predominance. 24 We found only fair agreement with regard to the opacity predominance, distribution, and laterality, which probably relates to the lower sensitivity of CXRs compared with CT for pulmonary opacities. A previously published study reported 69% sensitivity for diagnosis on the baseline CXR, 9 similar to our findings. On the other hand, we found that this high sensitivity had a trade-off with low specificity, which represents the reader's avoidance of falsenegative results, offsetting with more false-positive results. This observation can only be made with a control group. This Volume 21, no. 5: September 2020 Diagnostic and Prognostic Value of Chest Radiographs for COVID-19 at Presentation Kerpel et al.
Highly experienced radiologists' expert opinions for guessing COVID-19 status were not reliable and did not reach a high enough interobserver agreement to discuss the accuracy parameters. However, poor interobserver agreement regarding specific disease status on CXRs was documented in previous studies. 25,26 Despite the limited role of imaging in the diagnosis of COVID-19 as expressed by leading societies worldwide, 12,13,27 the CXR is still the recommended imaging tool for any patient presenting at the ED with an acute respiratory illness. 28 Future COVID-19 patients will continue to have CXRs at presentation before their disease status is known to the referring clinicians.
To complicate matters, even in the ideal setting, when RT-PCR is available and results are delivered within minutes to hours, the sensitivity of the RT-PCR for SARS-CoV2 is poor, 29 leaving emergency clinicians with a dilemma as to how to manage patients with non-specific presenting symptoms suggestive of COVID-19 with a negative initial RT-PCR test. This dilemma emphasizes the need to maximize available knowledge in the ED setting. Time from symptom onset is available data in this setting, and applying it to CXR interpretation may improve diagnostic accuracy.
Despite not being recommended for diagnosis of COVID-19, the CXR is a tool used for the risk stratification of patients with COVID-19 and is often used as an aid to decision-making with regard to discharge vs hospitalization and the amount of close monitoring needed for specific patients. 13,18,20 Our study validates this approach and shows that the amount of pulmonary opacities, as quantified by the RALE score, correlates with poor outcome. The knowledge gained from this study allows for a better understanding of the diagnostic and prognostic value of CXRs in COVID-19 patients and can aid emergency physicians in clinical decision-making. The added information can also serve educators and future researchers in understanding the strengths and weaknesses of CXRs, as this "classic" imaging modality is also the most frequently performed.

LIMITATIONS
This study has several sources of bias. Differential verification bias (double gold standard bias) 17 was present in our study, as we selected patients with only one RT-PCR test for the COVID-19 group, whereas we selected only patients with two negative RT-PCR tests for the control group. Lack of clinical follow-up to confirm the absence of COVID-19 precluded incorporation of this patient population with only one negative test into our study. In our opinion, the bias reduced specificity, as the patients in the control group were sicker with almost four times the mortality rate and a higher prevalence of heart disease and active cancer. Thus, the patients in the control group probably had more lung opacities than would be expected in the general population.
Similarly, spectrum bias potentially influenced our results because the control group was enriched with many "sickest of the sick," whose clinical condition influenced the decision to repeat the test and, hence, could underestimate the specificity. 17 Even though this methodology is well accepted, 16 and the motivation was to ensure having only truly non-COVID-19 patients in the control group, the trade-off eliminated many non-COVID-19 patients who might have had less remarkable radiographs. All these biases do not impact the results regarding prognosis, as these did not relate to the control group.
The study's results can be generalized to the ED setting. In a community setting, in which fewer non-COVID-19 patients have competing conditions, LRs will move further away from 1, and the test will appear more useful. 31

CONCLUSION
Chest radiography was found not to be a valid diagnostic tool for COVID-19. However, sensitivity increased in patients presenting later in the disease course. When presenting early, a normal or near-normal CXR is more likely in COVID-19. When a pleural effusion is present, the diagnosis is unlikely to be COVID-19. Furthermore, more extensive lung opacities at presentation are associated with poor outcome in COVID-19 patients. Thus, patients with more than minimal opacities should be monitored closely for clinical deterioration. This clinical application of chest radiography is its greatest strength in COVID-19 as it impacts patient care.