While awareness and detection of neuropsychiatric disorders have been on the rise since the mid-twentieth century, current diagnostic approaches lack laboratory-based support from genetic, neuroscientific, and behavioral research. The emerging field of phenomics increases the pace of discoveries in neuropsychiatry by examining physical and biochemical traits (i.e., phenotypes) of mental health disorders. Often the phenotypes measured in psychiatry include a multiplicity of data types such as clinical symptoms, behaviors, neurocognitive performance, or imaging data. However, the high-dimensionality problem often occurs in the test batteries and hinders simple interpretations of the characterizing patterns of phenotypic data. In this dissertation, I describe, develop, and implement a collection of statistical methods to evaluate relational structures in multi-dimensional phenotypic data for neuropsychiatric disorders.
Complexities in high-dimensional data require sophisticated statistical tools to disentangle and extract the true underlying structures effectively. Adaptive shrinkage techniques and sparse covariance estimation reduce the effective dimensionality of the data, producing more stable or parsimonious final models. Graphical models provide visual representations of the sparse structures in the data using conditional relationships among the variables. Joint graphical models have the additional advantage of revealing common as well as distinct patterns within and across multiple groups (e.g., diagnoses, treatments, symptom severity, or time points).
The analytical methods in this dissertation were motivated by the large p, small n problem often encountered in psychiatric data. The goal of phenomics is to establish sets of characterizing features that may be shared among multiple neuropsychiatric disorders. This is particularly important as comorbidity is now being recognized as the norm rather than the exception in the mental health field. However, the large number of interrelated measurements makes it difficult to interpret mutual and distinct characteristics across disorders.
This dissertation evaluates high-dimensional relational structures of phenotypic data through two projects conducted in the UCLA Semel Institute of Neuroscience and Human Behavior. The Time Reproduction in Neurodevelopmental Disorders (TRIND) project concentrates on a single construct called temporal processing. I applied a range of statistical methods to these data including repeated measures mixed effects models, joint graphical models, cluster analysis, and functional data analysis. Collectively, these approaches suggest that temporal processing abilities are multi-dimensional rather than a single latent construct. The Centers for Intervention Development and Applied Research: Translational Research to Enhance Cognitive Control (CIDAR-TRECC) is a much broader study assessing a multitude of neurocognitive constructs. Adaptive shrinkage techniques as well as joint graphical models help extract similarities and differences in the relational structures of neurocognition for children with neurodevelopmental disorders. I further extended the conventional joint graphical models algorithm to handle multiple groups over time.
In many ways, the taxonomic versus dimensional controversy in psychiatry is analogous to categorical versus continuous variables in statistics. There is no universal solution. In some situations, a continuous variable may describe a relationship more accurately, whereas other situations require a categorical viewpoint. Similarly, factor analysis may be preferred over Gaussian graphical models under one setting (e.g., to create a summary score) but fail in another (e.g., as a discriminatory tool). Simulations at the end of this dissertation show that there is no single answer. The optimal method depends not only on the pattern of the data, but also the objective of the study. The future of characterizing and treating neuropsychiatric disorders ought to be one which supports a comprehensive, synergistic viewpoint across both taxonomic and dimensional analyses using the appropriate data- and objective-driven statistical methods which successively disentangles the multi-dimensional constructs in phenotypic data.