INTRODUCTION: We previously developed a convolutional neural networks (CNN)-based algorithm to distinguish atypical ductal hyperplasia (ADH) from ductal carcinoma in situ (DCIS) using a mammographic dataset. The purpose of this study is to further validate our CNN algorithm by prospectively analyzing an unseen new dataset to evaluate the diagnostic performance of our algorithm. MATERIALS AND METHODS: In this institutional review board-approved study, a new dataset composed of 280 unique mammographic images from 140 patients was used to test our CNN algorithm. All patients underwent stereotactic-guided biopsy of calcifications and underwent surgical excision with available final pathology. The ADH group consisted of 122 images from 61 patients with the highest pathology diagnosis of ADH. The DCIS group consisted of 158 images from 79 patients with the highest pathology diagnosis of DCIS. Two standard mammographic magnification views (craniocaudal and mediolateral/lateromedial) of the calcifications were used for analysis. Calcifications were segmented using an open source software platform 3D slicer and resized to fit a 128 × 128 pixel bounding box. Our previously developed CNN algorithm was used. Briefly, a 15 hidden layer topology was used. The network architecture contained 5 residual layers and dropout of 0.25 after each convolution. Diagnostic performance metrics were analyzed including sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve. The positive class was defined as the pure ADH group in this study and thus specificity represents minimizing the amount of falsely labeled pure ADH cases. RESULTS: Area under the receiver operating characteristic curve was 0.90 (95% confidence interval, ± 0.04). Diagnostic accuracy, sensitivity, and specificity was 80.7%, 63.9%, and 93.7%, respectively. CONCLUSION: Prospectively tested on new unseen data, our CNN algorithm distinguished pure ADH from DCIS using mammographic images with high specificity.