- Lee, Christoph I;
- Abraham, Linn;
- Miglioretti, Diana L;
- Onega, Tracy;
- Kerlikowske, Karla;
- Lee, Janie M;
- Sprague, Brian L;
- Tosteson, Anna NA;
- Rauscher, Garth H;
- Bowles, Erin JA;
- diFlorio-Alexander, Roberta M;
- Henderson, Louise M
Background It is important to establish screening mammography performance benchmarks for quality improvement efforts. Purpose To establish performance benchmarks for digital breast tomosynthesis (DBT) screening and evaluate performance trends over time in U.S. community practice. Materials and Methods In this retrospective study, DBT screening examinations were collected from five Breast Cancer Surveillance Consortium (BCSC) registries between 2011 and 2018. Performance measures included abnormal interpretation rate (AIR), cancer detection rate (CDR), sensitivity, specificity, and false-negative rate (FNR) and were calculated based on the American College of Radiology Breast Imaging Reporting and Data System, fifth edition, and compared with concurrent BCSC DM screening examinations, previously published BCSC and National Mammography Database benchmarks, and expert opinion acceptable performance ranges. Benchmarks were derived from the distribution of performance measures across radiologists (n = 84 or n = 73 depending on metric) and were presented as percentiles. Results A total of 896 101 women undergoing 2 301 766 screening examinations (458 175 DBT examinations [median age, 58 years; age range, 18-111 years] and 1 843 591 DM examinations [median age, 58 years; age range, 18-109 years]) were included in this study. DBT screening performance measures were as follows: AIR, 8.3% (95% CI: 7.5, 9.3); CDR per 1000 screens, 5.8 (95% CI: 5.4, 6.1); sensitivity, 87.4% (95% CI: 85.2, 89.4); specificity, 92.2% (95% CI: 91.3, 93.0); and FNR per 1000 screens, 0.8 (95% CI: 0.7, 1.0). When compared with BCSC DM screening examinations from the same time period and previously published BCSC and National Mammography Database performance benchmarks, all performance measures were higher for DBT except sensitivity and FNR, which were similar to concurrent and prior DM performance measures. The following proportions of radiologists achieved acceptable performance ranges with DBT: 97.6% for CDR, 91.8% for sensitivity, 75.0% for AIR, and 74.0% for specificity. Conclusion In U.S. community practice, large proportions of radiologists met acceptable performance ranges for screening performance metrics with DBT. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Lee and Moy in this issue.