PURPOSE:To study the variability in volume change estimates of pulmonary nodules due to segmentation approaches used across several algorithms and to evaluate these effects on the ability to predict nodule malignancy. METHODS:We obtained 100 patient image datasets from the National Lung Screening Trial (NLST) that had a nodule detected on each of two consecutive low dose computed tomography (LDCT) scans, with an equal proportion of malignant and benign cases (50 malignant, 50 benign). Information about the nodule location for the cases was provided by a screen capture with a bounding box and its axial location was indicated. Five participating quantitative imaging network (QIN) institutions performed nodule segmentation using their preferred semi-automated algorithms with no manual correction; teams were allowed to provide additional manually corrected segmentations (analyzed separately). The teams were asked to provide segmentation masks for each nodule at both time points. From these masks, the volume was estimated for the nodule at each time point; the change in volume (absolute and percent change) across time points was estimated as well. We used the concordance correlation coefficient (CCC) to compare the similarity of computed nodule volumes (absolute and percent change) across algorithms. We used Logistic regression model on the change in volume (absolute change and percent change) of the nodules to predict the malignancy status, the area under the receiver operating characteristic curve (AUROC) and confidence intervals were reported. Because the size of nodules was expected to have a substantial effect on segmentation variability, analysis of change in volumes was stratified by lesion size, where lesions were grouped into those with a longest diameter of <8 mm and those with longest diameter ≥ 8 mm. RESULTS:We find that segmentation of the nodules shows substantial variability across algorithms, with the CCC ranging from 0.56 to 0.95 for change in volume (percent change in volume range was [0.15 to 0.86]) across the nodules. When examining nodules based on their longest diameter, we find the CCC had higher values for large nodules with a range of [0.54 to 0.93] among the algorithms, while percent change in volume was [0.3 to 0.95]. Compared to that of smaller nodules which had a range of [-0.0038 to 0.69] and percent change in volume was [-0.039 to 0.92]. The malignancy prediction results showed fairly consistent results across the institutions, the AUC using change in volume ranged from 0.65 to 0.89 (Percent change in volume was 0.64 to 0.86) for entire nodule range. Prediction improves for large nodule range (≥ 8 mm) with AUC range 0.75 to 0.90 (percent change in volume was 0.74 to 0.92). Compared to smaller nodule range (<8 mm) with AUC range 0.57 to 0.78 (percent change in volume was 0.59 to 0.77). CONCLUSIONS:We find there is a fairly high concordance in the size measurements for larger nodules (≥8 mm) than the lower sizes (<8 mm) across algorithms. We find the change in nodule volume (absolute and percent change) were consistent predictors of malignancy across institutions, despite using different segmentation algorithms. Using volume change estimates without corrections shows slightly lower predictability (for two teams).