- Li, Yuan;
- Zhang, Yang;
- Zhang, Enlong;
- Chen, Yongye;
- Wang, Qizheng;
- Liu, Ke;
- Yu, Hon J;
- Yuan, Huishu;
- Lang, Ning;
- Su, Min-Ying
Objectives
To evaluate the performance of deep learning using ResNet50 in differentiation of benign and malignant vertebral fracture on CT.Methods
A dataset of 433 patients confirmed with 296 malignant and 137 benign fractures was retrospectively selected from our spinal CT image database. A senior radiologist performed visual reading to evaluate six imaging features, and three junior radiologists gave diagnostic prediction. A ROI was placed on the most abnormal vertebrae, and the smallest square bounding box was generated. The input channel into ResNet50 network was 3, including the slice with its two neighboring slices. The diagnostic performance was evaluated using 10-fold cross-validation. After obtaining the malignancy probability from all slices in a patient, the highest probability was assigned to that patient to give the final diagnosis, using the threshold of 0.5.Results
Visual features such as soft tissue mass and bone destruction were highly suggestive of malignancy; the presence of a transverse fracture line was highly suggestive of a benign fracture. The reading by three radiologists with 5, 3, and 1 year of experience achieved an accuracy of 99%, 95.2%, and 92.8%, respectively. In ResNet50 analysis, the per-slice diagnostic sensitivity, specificity, and accuracy were 0.90, 0.79, and 85%. When the slices were combined to ve per-patient diagnosis, the sensitivity, specificity, and accuracy were 0.95, 0.80, and 88%.Conclusion
Deep learning has become an important tool for the detection of fractures on CT. In this study, ResNet50 achieved good accuracy, which can be further improved with more cases and optimized methods for future clinical implementation.Key points
• Deep learning using ResNet50 can yield a high accuracy for differential diagnosis of benign and malignant vertebral fracture on CT. • The per-slice diagnostic sensitivity, specificity, and accuracy were 0.90, 0.79, and 85% in deep learning using ResNet50 analysis. • The slices combined with per-patient diagnostic sensitivity, specificity, and accuracy were 0.95, 0.80, and 88% in deep learning using ResNet50 analysis.