A Comparative Study of Machine Learning Models and Feature Selection Techniques for Predicting Fragile X Tremor Ataxia Risk
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

A Comparative Study of Machine Learning Models and Feature Selection Techniques for Predicting Fragile X Tremor Ataxia Risk

Abstract

Fragile X–associated tremor/ataxia syndrome (FXTAS) primarily affects older adults who carry the FMR1 gene premutation.This conditions include severe symptoms such as cognitive deterioration, intention tremors, neuropathy, and progressive ataxia. Despite its significant impact on individuals and their families, there is currently no dependable method for predicting the onset or progression of FXTAS.Our research aims to fill this critical gap by introducing a predictive method based on a thorough analysis of clinical , genetics and behavioral factors. We utilized a dataset comprising longitudinal records from 103 patients over three to five visits.Employing advanced feature selection techniques and Random Forest probabilistic models, we developed a highly accurate risk prediction model for FXTAS. Our study has three primary objectives:first, to find an ideal combination of Machine Learning (ML) models and feature selection techniques that perform better across different performance metrics—accuracy, recall, precision, sensitivity, specificity, second, to determine whether undersampling or oversampling provides better results across all performance metrics; and third, to quantify the risk by determining precise risk scores. Our analysis includes four feature selection methods—Random Forest, Lasso, Recursive Feature Elimination (RFE), and Statistical Feature Selection (SFS)—and four classification algorithms: Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosting (XGBoost), and Random Forest (RF). The combination of XGBoost and Recursive Feature Elimination (RFE) and the combination of Random Forest and RFE both performed exceptionally well, achieving the highest accuracy of 86.67 and accuracy of 90%compared to other models. The feature selection methods results showed consistent features: Stop Signal Task (SST) Median Score ,and Full Intelligent Quotient (IQ) Score which are both used to evaluate cognitive functions.Another consistent features shown were five-choice Movement Reaction Time and Purdue Pegboard Scores (right-hand and left-hand) measure which are different aspects of motor skills. These findings provide significant advancements in clinical decision-making and personalized treatment strategies for diagnosing FXTAS.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View