- Gay, Skylar;
- Kisling, Kelly;
- Anderson, Brian;
- Zhang, Lifei;
- Rhee, Dong;
- Nguyen, Callistus;
- Netherton, Tucker;
- Yang, Jinzhong;
- Brock, Kristy;
- Jhingran, Anuja;
- Simonds, Hannah;
- Klopp, Ann;
- Beadle, Beth;
- Court, Laurence;
- Cardenas, Carlos
PURPOSE: Two-dimensional radiotherapy is often used to treat cervical cancer in low- and middle-income countries, but treatment planning can be challenging and time-consuming. Neural networks offer the potential to greatly decrease planning time through automation, but the impact of the wide range of hyperparameters to be set during training on model accuracy has not been exhaustively investigated. In the current study, we evaluated the effect of several convolutional neural network architectures and hyperparameters on 2D radiotherapy treatment field delineation. METHODS: Six commonly used deep learning architectures were trained to delineate four-field box apertures on digitally reconstructed radiographs for cervical cancer radiotherapy. A comprehensive search of optimal hyperparameters for all models was conducted by varying the initial learning rate, image normalization methods, and (when appropriate) convolutional kernel size, the number of learnable parameters via network depth and the number of feature maps per convolution, and nonlinear activation functions. This yielded over 1700 unique models, which were all trained until performance converged and then tested on a separate dataset. RESULTS: Of all hyperparameters, the choice of initial learning rate was most consistently significant for improved performance on the test set, with all top-performing models using learning rates of 0.0001. The optimal image normalization was not consistent across architectures. High overlap (mean Dice similarity coefficient = 0.98) and surface distance agreement (mean surface distance < 2 mm) were achieved between the treatment field apertures for all architectures using the identified best hyperparameters. Overlap Dice similarity coefficient (DSC) and distance metrics (mean surface distance and Hausdorff distance) indicated that DeepLabv3+ and D-LinkNet architectures were least sensitive to initial hyperparameter selection. CONCLUSION: DeepLabv3+ and D-LinkNet are most robust to initial hyperparameter selection. Learning rate, nonlinear activation function, and kernel size are also important hyperparameters for improving performance.