Skip to main content
Open Access Publications from the University of California


UC San Francisco Previously Published Works bannerUCSF

Postoperative delirium prediction using machine learning models and preoperative electronic health record data.



Accurate, pragmatic risk stratification for postoperative delirium (POD) is necessary to target preventative resources toward high-risk patients. Machine learning (ML) offers a novel approach to leveraging electronic health record (EHR) data for POD prediction. We sought to develop and internally validate a ML-derived POD risk prediction model using preoperative risk features, and to compare its performance to models developed with traditional logistic regression.


This was a retrospective analysis of preoperative EHR data from 24,885 adults undergoing a procedure requiring anesthesia care, recovering in the main post-anesthesia care unit, and staying in the hospital at least overnight between December 2016 and December 2019 at either of two hospitals in a tertiary care health system. One hundred fifteen preoperative risk features including demographics, comorbidities, nursing assessments, surgery type, and other preoperative EHR data were used to predict postoperative delirium (POD), defined as any instance of Nursing Delirium Screening Scale ≥2 or positive Confusion Assessment Method for the Intensive Care Unit within the first 7 postoperative days. Two ML models (Neural Network and XGBoost), two traditional logistic regression models ("clinician-guided" and "ML hybrid"), and a previously described delirium risk stratification tool (AWOL-S) were evaluated using the area under the receiver operating characteristic curve (AUC-ROC), sensitivity, specificity, positive likelihood ratio, and positive predictive value. Model calibration was assessed with a calibration curve. Patients with no POD assessments charted or at least 20% of input variables missing were excluded.


POD incidence was 5.3%. The AUC-ROC for Neural Net was 0.841 [95% CI 0. 816-0.863] and for XGBoost was 0.851 [95% CI 0.827-0.874], which was significantly better than the clinician-guided (AUC-ROC 0.763 [0.734-0.793], p < 0.001) and ML hybrid (AUC-ROC 0.824 [0.800-0.849], p < 0.001) regression models and AWOL-S (AUC-ROC 0.762 [95% CI 0.713-0.812], p < 0.001). Neural Net, XGBoost, and ML hybrid models demonstrated excellent calibration, while calibration of the clinician-guided and AWOL-S models was moderate; they tended to overestimate delirium risk in those already at highest risk.


Using pragmatically collected EHR data, two ML models predicted POD in a broad perioperative population with high discrimination. Optimal application of the models would provide automated, real-time delirium risk stratification to improve perioperative management of surgical patients at risk for POD.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View