Ports, Kayleen

Dementia Prediction Model for American Indian and Alaska Native Individuals Based on Electronic Health Record Data Using Machine Learning Algorithms

2022

Ports, Kayleen
Advisor(s): Jiang, Luohua

Abstract

Background: The worldwide growth in the elderly population is anticipated to be accompanied by significant growth in the prevalence of dementia. The identification of individuals at high risk of dementia is critical for early diagnosis and intervention and, therefore, numerous dementia prediction models have been developed. However, limited models have been developed using electronic health record (EHR) data and no models have been developed or validated for use among American Indian and Alaska Native (AI/AN) individuals. Focusing on a national sample of Indian Health Service (IHS) users, this investigation aimed to develop and internally validate a two-year, all-cause incident dementia risk prediction model using EHR data and machine learning (ML) algorithms. Methods: Seven years of data from the IHS National Data Warehouse and related EHR databases were extracted for this investigation. Five years of baseline health and service use data from fiscal year (FY) 2007-2011 were utilized to predict the risk of dementia diagnosed between FY 2012-2013. The study cohort included 17,451 IHS users aged 65 years and above without a recorded diagnosis of dementia by the end of FY 2011. Three algorithms were compared: Logistic Regression (LR) with Backwards Stepwise Selection, Least Absolute Shrinkage and Selection Operator (LASSO) Regression, and eXtreme Gradient Boosting (XGBoost). For each algorithm, a literature-based and an extended model were developed. Model performance was compared using area under the receiver operating charactering curve (AUC) and model-specific feature importance was examined. Results: During the outcome assessment period, 631 (3.6%) individuals were diagnosed with dementia. Extended XGBoost and LASSO models exhibited the best discriminatory performance (AUC of 0.807 and 0.803, respectively). Compared to LR and LASSO, XGBoost required significantly less investigator-driven data preprocessing. Significant predictors identified by different algorithms had substantial overlaps: all three algorithms shared 10 common predictors among the top 15 predictors identified. Novel predictors identified included the frequency of inpatient hospitalizations, emergency room visits, and hospital observations occurring during 5-year period preceding diagnosis Conclusions: Routinely collected EHR data can be used to predict the two-year risk of incident, all-cause dementia among IHS users. Following external validation, the developed risk prediction model(s) could serve as a clinical tool to aid IHS clinicians in the identification of AI/AN individuals at high risk for dementia.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Irvine

Dementia Prediction Model for American Indian and Alaska Native Individuals Based on Electronic Health Record Data Using Machine Learning Algorithms