Accuracy of Conventional Transthoracic Echocardiography for the Diagnosis of Intracardiac Right‐to‐Left Shunt: A Meta‐Analysis of Prospective Studies

Paradoxical embolization through a right‐to‐left shunt (RLS), often from a patent foramen ovale (PFO), has been associated with cryptogenic stroke. While transesophageal echo (TEE) bubble study is the current standard reference for diagnosing PFO, transthoracic echo (TTE) remains the most commonly used screening test for RLS due to its noninvasiveness and easy availability. The aim of this meta‐analysis was to determine the accuracy of TTE compared to TEE as the reference.

Paradoxical embolization through a right-toleft shunt (RLS), often from a patent foramen ovale (PFO), is a recognized mechanism for ischemic stroke, 1,2 especially in patients without any identifiable cause of stroke. 3 Although the CLO-SURE 1, RESPECT, and PC trials failed to meet their primary endpoints by intention-to-treat analysis, recent meta-analyses of these trials and observational studies are encouraging. 4,5 The totality of evidence suggests that PFO occluding devices may reduce the recurrence of stroke and transient ischemic attack compared to medical treatment in patients with cryptogenic stroke, particularly in those with greater shunting. These data along with the evaluation of patients with severe migraines or other conditions associated with PFO 6-9 make it essential to accurately diagnose RLS in patients being considered for transcatheter PFO closure.
While contrast-enhanced transesophageal echocardiogram (TEE) is considered the gold standard for diagnosing PFO, 10,11 conventional transthoracic echocardiogram (TTE) with bubble study is the most commonly used initial screening test for the detection of RLS due to its noninvasiveness and low cost. [12][13][14][15][16][17][18][19][20][21][22][23][24] The aims of this study were to: (a) Expand on prior reviews of TTE and provide the first meta-analysis on this topic that methodically assesses the diagnostic accuracy of conventional TTE in the evaluation of patients for an intracardiac RLS.
(b) Determine the utility of TTE for the diagnosis of RLS in the general population and in a population of patients with cryptogenic stroke. (c) Perform a sensitivity analysis of different TTE protocols to determine the best methodology for diagnosing intracardiac RLS.
Methods: Literature Review: Relevant citations were searched for on Medline, Cochrane, and Embase. The search was completed in August 2013, yielding published literature since 1913. The terms that were used in the search were "PFO" OR "patent foramen ovale" OR "right to left shunt" OR "atrial septal defect" AND "TTE" OR "transthoracic echocardiography" OR "transthoracic echo" OR "echo" OR "echocardiography." The references of all of the primary studies as well as those of other known prior reviews were analyzed to find cited articles that were not found by the initial searches. No restrictions were used regarding publication language. Abstracts lacking peer-reviewed manuscripts were omitted since they would not have enough data required for the meta-analysis.

Selection of Studies:
Articles that were identified were analyzed by three independent reviewers (M.K.M., J.S.W., and S.C.R.). Each article was screened for preset inclusion criteria: (1) Original prospective studies (reviews, abstracts, isolated cases, commentaries, editorials, and letters were excluded) and any other comments that should be considered in the analysis such as disagreements between the reviewers, any difficulties with interpreting the data and reasons for exclusion.

Quality Assessment:
The assessment of quality of each study was done by evaluating 14 items considered relevant to the review topic, based on the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) instrument. 25 Three reviewers (S.C.R., J.S.W., and M.K.M.) independently assessed the quality items, and discrepancies were resolved by consensus.

Statistical Analysis:
Inter-rater agreement was assessed by calculating Fleiss' kappa using R. 26 Diagnostic odds ratio (DOR) was used as a measure of effect size to assess for publication bias. Publication bias analysis was then performed using Comprehensive Meta-analysis version 2 27 by visual inspection of the funnel plot and statistically using fail-safe N and the trim-and-fill method. The Meta-DiSc software (Version 1.4; Unit of Clinical Biostatistics, Ram on y Cajal Hospital, Madrid, Spain) was used for all other diagnostic accuracy analyses. 28 Potential variations due to threshold effect were assessed statistically by computing the spearman correlation coefficient between the logit of sensitivity and the logit of 1-specificity as well as graphically by visually inspecting the accuracy estimates pairs in forest plots and summary receiver operating characteristic (sROC) curves. 28,29 Between-study heterogeneity (other than threshold effect) and between-study inconsistency were assessed by the Cochran Q statistic and the inconsistency index (I 2 ), respectively, and the level of significance for the corresponding P-value was set at P = 0.10. Due to anticipated interstudy heterogeneity, a random effects analysis model (DerSimonian Laird) 30 was used for all meta-analyses because it provides more conservative estimates of the pooled data. The stability of the diagnostic accuracy results was assessed by one-way sensitivity analysis which was performed by omitting every study (one at a time) from the meta-analysis. To investigate the effects of potential sources of heterogeneity in the pooled calculations, subgroup analyses were performed considering more homogenous set of studies that adopted similar design variables.  sROC curves were constructed using the DerSimonian Laird random effects model. The area under the curve (AUC) and index Q* were used to assess and summarize the discriminating ability of the sROC curve. 31 Posttest probabilities were calculated and illustrated using Fagan nomograms. Subgroups were constructed only when ≥3 studies could be included. Differences between subgroups were assessed by tests of interaction. 32 To correct for multiplicity of comparisons in subgroup analyses, P-values of paired comparisons between subgroups were adjusted with the Bonferroni-Holm procedure. 33 Values of 95% confidence intervals (CI) were used for all pooled data, all P-values are two tailed and an adjusted P-value of <0.05 was considered statistically significant unless otherwise specified.

Results: Characteristics of the Included Studies:
Of the 52 potential studies identified, 13 prospective studies comprising 1436 patients met inclusion criteria and formed the dataset. [12][13][14][15][16][17][18][19][20][21][22][23][24] Overall reviewer agreement with regard to study inclusion or exclusion was substantial, with a kappa of 0.83. 39 studies did not meet the inclusion criteria and were excluded; the excluded studies along with the reasons for exclusion are provided in the supplementary results. Figure 1 describes the study selection method used for this analysis. Quality Assessment: Using the recommended 14-item checklist for evaluating imaging studies using QUADAS, items 4, 10, 11, 13, and 14 either scored poorly or were considered unclear: item 4: ("time between tests acceptable?"), item 10: ("index test results blinded?"), item 11: ("reference standard results blinded?"), item 13: ("uninterpretable data reported?"), and item 14: ("withdrawals reported?"). Item 4 refers to the time interval between the index and reference tests and may lead to disease progression bias. Items 10 and 11 refer to blinding and may affect diagnostic accuracy, otherwise known as review bias. Item 13 refers to any segments that may have been uninterpretable, and can, therefore, result in false elevations of test accuracy and item 14 may lead to test performance bias, as patients unfit are removed to improve accuracy. Otherwise, all studies demonstrated high-quality scoring on the remaining 9 items (Figs. 2 and 3).

Transthoracic Echo Diagnostic Value:
A total of 13 studies met all inclusion criteria and were used for further meta-analytic calculations. Figure 4 shows that the actual combined effect size does not significantly differ from the theoretical combined effect size (one-sided P-value = 0.20). The fail-safe N was 415. Table I   describes the characteristics of the included studies  and Table II describes the diagnostic accuracies of the studies. The major clinical indication for performing a TTE in most of the studies was stroke. Of the 13 studies that performed TTE and TEE with contrast, 6 (46%) used agitated saline as the contrast agent, 2 (15%) used agitated saline with blood, 4 (31%) used a gelatin-based solution, and 1 (8%) used more than one contrast agent. The majority of the comparisons used detection of ≥1 microbubble in the left atrium as the embolic cutoff for a positive TTE and TEE (46%; 6/13). Fortysix percent (6/13) of the studies performed the contrast injection during the Valsalva maneuver. The most commonly used cutoff for a positive intracardiac RLS was visualization of bubbles in the left atrium within 3 cardiac cycles (38%; 5/13).
When all eligible studies were pooled into the diagnostic accuracy meta-analysis, the overall sensitivity of TTE for the diagnosis of intracardiac RLS was 46.4% (95% CI: 41.    Figure 5. The pooled AUC and index Q* were 0.94 (95% CI: 0.92-0.97) and 0.88 (95% CI: 0.85-0.92), respectively (Fig. 6). Figure 7 demonstrates the pre-and posttest probabilities of detecting an intracardiac RLS with TTE in the general population, and in our study cohort consisting mainly of patients with cryptogenic stroke. Since RLS through a PFO is present in 20% of the adult population 34 and in approximately 50% of patients with cryptogenic stroke, 35,36 these respective prevalences were assumed to demonstrate the likelihood of detecting an intracardiac RLS by TTE in the 2 populations. Figure 6     using TTE. As Figure 7A shows, with an RLS prevalence of 20%, a positive TTE result will have an 84% probability (95% CI: 76-89%) of being a true positive. Furthermore, a negative TTE will have a 12% probability (95% CI: 11-14%) of being a false negative. If RLS prevalence is 50% (Fig. 7B) then a positive TTE result will have a 95% probability (95% CI: 92-97%) of being a true positive. In addition, a negative TTE result will have a 36% probability (95% CI: 35-38%) of being a false negative. Table III summarizes the results of the subgroup analyses. There were no statistically significant differences in sensitivity, specificity, LR+ or LRÀ between the studies that utilized different contrast agents, different microbubble cutoffs for

Discussion:
Our study demonstrates that conventional TTE with contrast detects intracardiac RLS with a sensitivity of 46% and specificity of 99% when contrast TEE is used as the reference technique. In addition, TTE has a LR+ of 20.85 and LRÀ of 0.57 which specifically make it an excellent rule in test but a poor rule out test for intracardiac RLS in a population of patients with cryptogenic stroke (Fig. 7). A subanalysis of different study protocols revealed that the accuracy of TTE was not significantly affected by the use of different contrast agents, different microembolic cutoffs for a positive TTE/TEE or different cutoffs for visualization of bubbles in the left atrium within a certain number of cardiac cycles after contrast injection. This study is the first meta-analysis to determine the accuracy of conventional TTE in the detection of intracardiac RLS while also comparing different protocols to demonstrate the optimal methodological technique. This is also the first meta-analysis to demonstrate the posttest probabilities of TTE in the general population and in a population of patients with cryptogenic stroke.
While contrast TEE currently remains the gold standard for the diagnosis of RLS through a PFO, 10,11 TEE may be an uncomfortable and time-consuming procedure. Although extremely uncommon, life-threatening complications such as esophageal bleeding or perforation may occur. Contraindications of TEE such as esophageal or gastric varices, Barrett's esophagus, Zenker's diverticulum, esophageal or pharyngeal carcinoma, strictures, Mallory-Weiss tears or patients with a serious risk of bleeding make a reliable alternative imaging modality a growing need in contemporary clinical practice. 37 Transthoracic echocardiogram with bubble study remains the most commonly used initial screening test for RLS due to its easy availability, low cost and noninvasiveness. [12][13][14][15][16][17][18][19][20][21][22][23][24] However, our meta-analysis confirms that the low sensitivity and high LRÀ of TTE make it a poor rule out test for intracardiac RLS. As an initial screening test, conventional TTE may need to be replaced by more sensitive alternatives such as TTE with second harmonic imaging or transcranial Doppler, which have reported sensitivities that are comparable to TEE. 18,20 Two recent meta-analyses of prospective studies comparing TTE harmonic imaging and transcranial Doppler to TEE for the diagnosis of intracardiac RLS demonstrated a sen- sitivity of 91% and specificity of 93% with TTE harmonic imaging (unpublished data), and a sensitivity of 97% and specificity of 93% with transcranial Doppler. 38 However, the excellent specificity and LR+ of conventional TTE make a positive result highly useful, especially in a population of patients with cryptogenic stroke (Fig. 7). There is currently no standardized protocol for performing a TTE bubble study and the methodology often varies depending on the institution. Thus, there was considerable variability in the diagnostic protocols of the included studies. Considering the heterogeneity of the studies, we sought to examine the effect of different protocols on the accuracy and utility of TTE. According to our secondary analysis, using different contrast agents, different microembolic cutoffs for a positive TTE, and different cutoffs for the appearance of bubbles in the left atrium within a different number of cardiac cycles, did not affect the accuracy of TTE. Thus, while the exact protocol of TTE bubble study is variable among different institutions, our study indicates that changing these parameters may neither enhance nor reduce the accuracy of TTE.

Limitations:
In performing our analysis, we recognize the presence of several limitations. Differences in study design and diagnostic threshold for detecting a positive test are sources of heterogeneity that limit our calculation of diagnostic accuracy. We attempted to perform a subanalysis on different protocols, where possible, to assess the effect of changing the TTE protocol on accuracy of the test. However, we were unable to perform a subanalysis on some parameters such as utility of agitated saline with blood as the contrast agent or comparing the timing of contrast injection relative to Valsalva maneuver, due to the lack of the minimum number studies required to make these comparisons.
The methodological quality of the included studies was assessed using quality assessment of diagnostic accuracy studies (QUADAS) to determine the effect of methodology, and inherent shortcomings, on diagnostic accuracy. Most of the studies in this meta-analysis had high methodological quality with very minimal concerns regarding study biases (Figs. 2 and 3). Of the 13 included studies, 62% (8/13) did not mention the time interval between the index and refer-  ence tests. Although this could potentially lead to disease progression bias, this limitation may not apply in our study as PFO is a congenital condition and the results of the tests are unlikely to be affected by a delayed time period between the 2 tests. Uninterpretable results and withdrawals, which represented the "flow and timing" section, were unclear in 54% (7/13) and 92% (12/13) of the studies, respectively. These 2 data are often not reported in diagnostic accuracy studies with the uninterpretable results and withdrawals simply removed from the analysis. This may lead to a biased assessment of the test characteristics. Whether or not bias will arise depends on the possible correlation between uninterpretable test results and the true disease status. Uninterpretable results frequently occur randomly and are not related to the true disease status of the individual. Therefore, in theory, these should not have any effect on test performance. Likewise, 6 of 13 studies (46%) did not clearly specify whether or not blinding occurred in the index or reference tests. This may potentially lead to review bias resulting in inflated measures of diagnostic accuracy. Finally, TEE itself may have limitations in the detection of RLS in some patients, due to sedation and the difficulty of performing an adequate Valsalva maneuver with a probe in the esophagus. Studies that have compared TEE to autopsy or intraoperative detection of PFO have demonstrated that the diagnosis can sometimes be missed by TEE. 39,40 Other studies that determined accuracy of TEE in the detection of PFO using catheterization and/or surgery as the reference demonstrated a sensitivity of 91-100%. 13,41 Thus, our results may have overestimated the sensitivity of TTE. In addition, some false-positive TTE or TEE studies may occur due to a pulmonary arteriovenous fistula that can be mistaken for an intracardiac shunt. Since our study cohort mainly consisted of patients with cryptogenic stroke, the presence of observed FP due to pulmonary shunting may be higher as pulmonary shunting occurs more commonly in patients with cryptogenic stroke compared to age-matched controls. 42

Conclusion:
Although TTE bubble study remains the most commonly used initial screening test for RLS, it has a poor sensitivity which makes it an unreliable rule out test. With a specificity of 99% and LR+ of 20.85, a positive TTE result yields a 95% posttest probability of RLS in patients with cryptogenic stroke. The accuracy of TTE is not affected by different contrast agents, and different cutoffs used for a varying number of microbubbles appearing in the left atrium within a varying number of cardiac cycles after contrast injection.  *Data in parentheses are 95% confidence intervals.