A Bayesian Framework for Fully Nonparametric Ordinal Regression
- Author(s): DeYoreo, Maria
- Advisor(s): Kottas, Athanasios
- et al.
Traditional approaches to ordinal regression rely on strong parametric assumptions for the regression function and/or the underlying response distribution. While they simplify inference, restrictions such as normality and linearity are inappropriate for most settings, and the need for flexible, nonlinear models which relax common distributional assumptions is clear. Through the use of Bayesian nonparametric modeling techniques, nonstandard features of regression relationships may be obtained if the data suggest them to be present. We introduce a general framework for multivariate ordinal regression, which is not restricted by linearity or additivity assumptions in the covariate effects. In particular, we assume the ordinal responses arise from latent continuous random variables through discretization, and model the latent response-covariate distribution using a Dirichlet process mixture of multivariate normals. We begin with the binary regression setting, both due to its prominent role in the literature and because it requires more specialized model development under our framework. In particular, we use a square-root-free Cholesky decomposition of the normal kernel covariance matrix, which facilitates model identifiability while allowing for appropriate dependence structure. Moreover, this model structure has the computational advantage of simplifying the implementation of Markov Chain Monte Carlo posterior simulation. Next, we develop modeling and inference methods for ordinal regression, including the underdeveloped setting that involves multivariate ordinal responses. Standard parametric models for ordinal regression suffer from computational challenges arising from identifiability constraints and parameter estimation, whereas due to the flexible nature of the nonparametric model, we overcome these difficulties. The modeling approach is further developed to handle ordinal regressions which are indexed in discrete-time, through use of a dependent Dirichlet process prior, which estimates the unique regression relationship at each time point in a flexible way while incorporating dependence across time. We consider several examples involving synthetic data to study the scope of the proposed methodology with respect to inference and prediction under both standard and more complex scenarios for the underlying data generating mechanism. Moreover, a variety of real data examples are used to illustrate our methods. As this methodology is especially well-suited to problems in ecology and population dynamics, we target applications in these areas. In particular, our methods are used to provide a detailed analysis of a data set on rockfish maturity and body characteristics collected across different years.