- Dhaubhadel, Sayera;
- Ganguly, Kumkum;
- Ribeiro, Ruy M;
- Cohn, Judith D;
- Hyman, James M;
- Hengartner, Nicolas W;
- Kolade, Beauty;
- Singley, Anna;
- Bhattacharya, Tanmoy;
- Finley, Patrick;
- Levin, Drew;
- Thelen, Haedi;
- Cho, Kelly;
- Costa, Lauren;
- Ho, Yuk-Lam;
- Justice, Amy C;
- Pestian, John;
- Santel, Daniel;
- Zamora-Resendiz, Rafael;
- Crivelli, Silvia;
- Tamang, Suzanne;
- Martins, Susana;
- Trafton, Jodie;
- Oslin, David W;
- Beckham, Jean C;
- Kimbrel, Nathan A;
- McMahon, Benjamin H
We present an ensemble transfer learning method to predict suicide from Veterans Affairs (VA) electronic medical records (EMR). A diverse set of base models was trained to predict a binary outcome constructed from reported suicide, suicide attempt, and overdose diagnoses with varying choices of study design and prediction methodology. Each model used twenty cross-sectional and 190 longitudinal variables observed in eight time intervals covering 7.5 years prior to the time of prediction. Ensembles of seven base models were created and fine-tuned with ten variables expected to change with study design and outcome definition in order to predict suicide and combined outcome in a prospective cohort. The ensemble models achieved c-statistics of 0.73 on 2-year suicide risk and 0.83 on the combined outcome when predicting on a prospective cohort of [Formula: see text] 4.2 M veterans. The ensembles rely on nonlinear base models trained using a matched retrospective nested case-control (Rcc) study cohort and show good calibration across a diversity of subgroups, including risk strata, age, sex, race, and level of healthcare utilization. In addition, a linear Rcc base model provided a rich set of biological predictors, including indicators of suicide, substance use disorder, mental health diagnoses and treatments, hypoxia and vascular damage, and demographics.