Background
Improved mortality prediction for patients in intensive care units is a big challenge. Many severity scores have been proposed, but findings of validation studies have shown that they are not adequately calibrated. The Super ICU Learner Algorithm (SICULA), an ensemble machine learning technique that uses multiple learning algorithms to obtain better prediction performance, does at least as well as the best member of its library. We aimed to assess whether the Super Learner could provide a new mortality prediction algorithm for patients in intensive care units, and to assess its performance compared with other scoring systems.Methods
From January, 2001, to December, 2008, we used the Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database (version 26) including all patients admitted to an intensive care unit at the Beth Israel Deaconess Medical Centre, Boston, MA, USA. We assessed the calibration, discrimination, and risk classification of predicted hospital mortality based on Super Learner compared with SAPS-II, APACHE-II, and SOFA. We calculated performance measures with cross-validation to avoid making biased assessments. Our proposed score was then externally validated on a dataset of 200 randomly selected patients admitted at the intensive care unit of Hôpital Européen Georges-Pompidou, Paris, France, between Sept 1, 2013, and June, 30, 2014. The primary outcome was hospital mortality. The explanatory variables were the same as those included in the SAPS II score.Findings
24,508 patients were included, with median SAPS-II of 38 (IQR 27-51) and median SOFA of 5 (IQR 2-8). 3002 of 24,508 (12%) patients died in the Beth Israel Deaconess Medical Centre. We produced two sets of predictions based on the Super Learner; the first based on the 17 variables as they appear in the SAPS-II score (SL1), and the second, on the original, untransformed variables (SL2). The two versions yielded average predicted probabilities of death of 0·12 (IQR 0·02-0·16) and 0·13 (0·01-0·19), whereas the corresponding value for SOFA was 0·12 (0·05-0·15) and for SAPS-II 0·30 (0·08-0·48). The cross-validated area under the receiver operating characteristic curve (AUROC) for SAPS-II was 0·78 (95% CI 0·77-0·78) and 0·71 (0·70-0·72) for SOFA. Super Learner had an AUROC of 0·85 (0·84-0·85) when the explanatory variables were categorised as in SAPS-II, and of 0·88 (0·87-0·89) when the same explanatory variables were included without any transformation. Additionally, Super Learner showed better calibration properties than previous score systems. On the external validation dataset, the AUROC was 0·94 (0·90-0·98) and calibration properties were good.Interpretation
Compared with conventional severity scores, Super Learner offers improved performance for predicting hospital mortality in patients in intensive care units. A user-friendly implementation is available online and should be useful for clinicians seeking to validate our score.Funding
Fulbright Foundation, Assistance Publique-Hôpitaux de Paris, Doris Duke Clinical Scientist Development Award, and the NIH.