Performance of Deep Learning and Genitourinary Radiologists in Detection of Prostate Cancer Using 3-T Multiparametric Magnetic Resonance Imaging
- Author(s): Cao, Ruiming
- Zhong, Xinran
- Afshari, Sohrab
- Felker, Ely
- Suvannarerg, Voraparee
- Tubtawee, Teeravut
- Vangala, Sitaram
- Scalzo, Fabien
- Raman, Steven
- Sung, Kyunghyun
- et al.
Published Web Locationhttps://doi.org/10.1002/jmri.27595
Background: Several deep learning-based techniques have been developed for prostate cancer (PCa) detection using multi-parametric MRI (mpMRI), but few of them have been rigorously evaluated relative to radiologists’ performance or whole-mount histopathology (WMHP).
Purpose: To compare the performance of a previously proposed deep learning algorithm, FocalNet, and expert radiologists in the detection of PCa on mpMRI with WMHP as the reference.
Study type: Retrospective, single-center study.
Subjects: 553 patients (development cohort: 427 patients; evaluation cohort: 126 patients) who underwent 3 T mpMRI prior to radical prostatectomy from October 2010 to February 2018.
Field Strength/Sequence: 3 T, T2-weighted imaging and diffusion-weighted imaging.
Assessment: FocalNet was trained on the development cohort to predict PCa locations by detection points, with a confidence value for each point, on the evaluation cohort. Four fellowship-trained genitourinary (GU) radiologists independently evaluated the evaluation cohort to detect suspicious PCa foci, annotate detection point locations, and assign a five-point suspicion score (1:least suspicious, 5:most suspicious) for each annotated detection point. The PCa detection performance of FocalNet and radiologists were evaluated by the lesion detection sensitivity versus the number of false-positive detections at different thresholds on suspicion scores. Clinically significant lesions: Gleason Group≥2 or pathological size≥10 mm. Index lesions: the highest Gleason Group and the largest pathological size (secondary).
Statistical tests: Bootstrap hypothesis test for the detection sensitivity between radiologists and FocalNet.
Results: For the overall differential detection sensitivity, FocalNet was 5.1% and 4.7% below the radiologists for clinically significant and index lesions, respectively; however, the differences were not statistically significant (P=0.413 and P=0.282, respectively).
Data Conclusion: FocalNet achieved slightly lower but not statistically significant PCa detection performance compared to GU radiologists. Compared with radiologists, FocalNet demonstrated similar detection performance for a highly sensitive setting (suspicion score≥1) or a highly specific setting (suspicion score=5) while lower performance in between.