- Christie, S Ariane;
- Hubbard, Alan E;
- Callcut, Rachael A;
- Hameed, Morad;
- Dissak-Delon, Fanny Nadia;
- Mekolo, David;
- Saidou, Arabo;
- Mefire, Alain Chichom;
- Nsongoo, Pierre;
- Dicker, Rochelle A;
- Cohen, Mitchell Jay;
- Juillard, Catherine
Background
Mortality prediction aids clinical decision making and is necessary for quality improvement initiatives. Validated metrics rely on prespecified variables and often require advanced diagnostics, which are unfeasible in resource-constrained contexts. We hypothesize that machine learning will generate superior mortality prediction in both high-income and low- and middle-income country cohorts.Methods
SuperLearner, an ensemble machine-learning algorithm, was applied to data from three prospective trauma cohorts: a highest-activation cohort in the United States, a high-volume center cohort in South Africa (SA), and a multicenter registry in Cameroon. Cross-validation was used to assess model discrimination of discharge mortality by site using receiver operating characteristic curves. SuperLearner discrimination was compared with standard scoring methods. Clinical variables driving SuperLearner prediction at each site were evaluated.Results
Data from 28,212 injured patients were used to generate prediction. Discharge mortality was 17%, 1.3%, and 1.7% among US, SA, and Cameroonian cohorts. SuperLearner delivered superior prediction of discharge mortality in the United States (area under the curve [AUC], 94-97%) and vastly superior prediction in Cameroon (AUC, 90-94%) compared with conventional scoring algorithms. It provided similar prediction to standard scores in the SA cohort (AUC, 90-95%). Context-specific variables (partial thromboplastin time in the United States and hospital distance in Cameroon) were prime drivers of predicted mortality in their respective cohorts, whereas severe brain injury predicted mortality across sites.Conclusions
Machine learning provides excellent discrimination of injury mortality in diverse settings. Unlike traditional scores, data-adaptive methods are well suited to optimizing precise site-specific prediction regardless of diagnostic capabilities or data set inclusion allowing for individualized decision making and expanded access to quality improvement programming.Level of evidence
Prognostic and therapeutic, level II and III.