Large language models enjoy wide spread applications in both general and more personalizeduse cases. These models can be dynamically trained on well defined clinical data. However,
several pre-existing models that have not been trained to provide diagnostic information
for disorders with clinical heterogeneity. Specifically, our preliminary analysis showed that
existing models such as BioMistral are generalized on publicly available PubMed data but are
unable to accurately take in clinical symptoms for accurate characterization of fetal alcohol
spectrum disorder (FASD). To overcome this challenge, we propose to retrain the pre-existing
BioMistral model on a synthetic FASD-specific training set to correctly categorize symptoms
into diagnostic codes. By changing the learning rates and epochs, we are able to evaluate
the performance of both overfitted or poorly trained models and a highly trained model on
a test set containing synthetic clinical notes. We demonstrate evaluation performance using
confusion matrices and the Kullback-leibler divergence (on the log-odds probabilities) and
show that retraining BioMistral model has the capability to correctly diagnose individuals
with fetal alcohol spectrum disorder over a poorly trained model.