Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Functional Analysis of Generalized Linear Models Under Nonlinear Constraints With Artificial Intelligence and Machine Learning Applications to the Sciences

Creative Commons 'BY-NC-ND' version 4.0 license
Abstract

This thesis presents multiple fundamental mathematical contributions to Generalized Linear Models (GLMs) ubiquitous to the sciences. The methodologies considered are shown to overcome biased estimates for parameters of interest in the sciences through new mathematical results and their applications in both nonparametric and parametric settings. The results are shown to be uniformly better in comparison to existing widely used methods in the sciences. In extensive simulation studies the methodologies outperform existing Artificial Intelligence (AI) and Machine Learning (ML) methods in the sciences for all around better Model fits, Inference and Prediction (MIP) results without losing interpretability of the parameter estimates. This is because the mathematical construction and their accompanying mathematical foundations ensure that the estimation procedure strongly converges to the parameters of interest. In the first application, I present a parametric version of the methodology (© Elsevier and Journal of Informetrics) titled “Functional analysis of generalized linear models under non-linear constraints with applications to identifying highly-cited papers.” In the second application, I extend this methodology in an entirely nonparametric setting which gives equivalent results to the parametric formulation under various circumstances, but may outperform it as well in others, especially if the underlying Data Generating Process (DGP) is asymmetric. Furthermore, I show that the categorical data models on which the methodologies are applied can be extended to any GLM, continuous or otherwise, while maintaining model interpretability and convergence results. In addition, I present a new prediction performance diagnostic statistic, called Adjusted ROC Statistic (ARS), which allows us to compare whether the prediction performance of various models fitted are statistically different. The nonparametric methodology is then further extended to give a new formulation of the binary regression framework widely used in the sciences. Through extensive simulation studies I show that this version of the methodology is more robust than the previous versions discussed. This general framework is then extended to various AI and ML applications widely used in the sciences. The entirety of the work also has some important consequences for our continued discussion on “statistical significance” vs. “scientific significance.” This includes the need for us to consider the strength of convergence of our methodology in addition to the subtle connections between Topological Spaces and Measure Spaces. Each of which are crucial to ensure almost sure convergence of the parameter estimates through the estimation algorithm presented termed, Latent Adaptive Hierarchical EM Like algorithm or LAHEML. As such, the results present a significantly expanded and more accurate toolset for Mathematicians, Statisticians, Scientists and Decision Makers at all levels for better model fit, inference and prediction outcomes.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View