## Type of Work

Article (7) Book (0) Theses (8) Multimedia (0)

## Peer Review

Peer-reviewed only (14)

## Supplemental Material

Video (0) Audio (0) Images (0) Zip (0) Other files (0)

## Publication Year

## Campus

UC Berkeley (0) UC Davis (0) UC Irvine (0) UCLA (0) UC Merced (0) UC Riverside (0) UC San Diego (0) UCSF (0) UC Santa Barbara (0) UC Santa Cruz (15) UC Office of the President (0) Lawrence Berkeley National Laboratory (0) UC Agriculture & Natural Resources (0)

## Department

Department of Economics, UCSC (1)

## Journal

## Discipline

Social and Behavioral Sciences (1)

## Reuse License

BY-NC - Attribution; NonCommercial use only (1)

## Scholarly Works (15 results)

The focus of this work is to develop a Bayesian framework to combine information from multiple parts of the response distribution characterized with different quantiles. The goal is to obtain a synthesized estimate of the covariate effects on the response variable as well as to identify the more influential predictors. This framework naturally relates to the traditional quantile regression, which studies the relationship between the covariates and the conditional quantile of the response variable and serves as an attractive alternative to the more widely used mean regression methods. We achieve the objectives through constructing a Bayesian mixture model using quantile regressions as the mixture components.

The first stage of the research involves the development of a parametric family of distributions to provide the mixture kernel for the Bayesian quantile mixture regression. We derive a new family of error distributions for model-based quantile regression called generalized asymmetric Laplace distribution, which is constructed through a structured mixture of normal distributions. The construction enables fixing specific percentiles of the distribution while, at the same time, allowing for varying mode, skewness and tail behavior. This family provides a practically important extension of the asymmetric Laplace distribution, which is the standard error distribution for parametric quantile regression. We develop a Bayesian formulation for the proposed quantile regression model, including conditional lasso regularized quantile regression based on a hierarchical Laplace prior for the regression coefficients, and a Tobit quantile regression model.

Next, we develop the main framework to model the conditional distribution of the response with a weighted mixture of quantile regression components. We specify a common regression coefficient vector for all components to synthesize information from multiple parts of the response distribution, each modeled with one quantile regression component. The goal is to obtain a combined estimate of the predictive effect of each covariate. We consider the following two choices of kernel densities for the mixture model. When the probability of the quantile in each regression component is known, we model the components with the generalized asymmetric Laplace distribution, as its shape parameter introduces flexibility in shape and skewness to the kernel; else when the quantile probabilities are unknown, we use the asymmetric Laplace distribution as kernel density and view its skewness parameter, which is also the quantile probability of the component, as a random quantity and estimate it from the data. Under each kernel density, we formulate the hierarchical structure of the mixture weights and develop the approach to the posterior inference. We consider both parametric and nonparametric priors for the framework, and explore inferences for the number of components to be included. We demonstrate the performance of the method in identification of influential variables with simulation examples and illustrate the posterior predictive inferences in a realty price data from the Boston metropolitan area.

Finally, we extend the framework to apply the methods to specific problems in survival analysis and epidemiology. Both applications involve analyses of two cohorts, which oftentimes exhibit differing responses given the same predictor input. We adapt the proposed framework to model the survival data with right-censoring. For applications in epidemiology, we study the ordering properties of the mixture kernels and incorporate stochastic ordering in the two-cohort mixture framework through structured priors, which conforms with the assumption in certain circumstances of receiver operating characteristic curve estimation. With the adapted models, we carry out cohort-specific identification of influential variables and gain insights into the contribution in estimation and prediction from different parts of the response distribution, which are depicted by the corresponding quantile regression components. We illustrate the applications with a time-to-event data set on length of stay at nursing home and two disease diagnosis data sets, one on adolescent depression and the other on cattle epidemics.

Traditional approaches to ordinal regression rely on strong parametric assumptions for the regression function and/or the underlying response distribution. While they simplify inference, restrictions such as normality and linearity are inappropriate for most settings, and the need for flexible, nonlinear models which relax common distributional assumptions is clear. Through the use of Bayesian nonparametric modeling techniques, nonstandard features of regression relationships may be obtained if the data suggest them to be present. We introduce a general framework for multivariate ordinal regression, which is not restricted by linearity or additivity assumptions in the covariate effects. In particular, we assume the ordinal responses arise from latent continuous random variables through discretization, and model the latent response-covariate distribution using a Dirichlet process mixture of multivariate normals. We begin with the binary regression setting, both due to its prominent role in the literature and because it requires more specialized model development under our framework. In particular, we use a square-root-free Cholesky decomposition of the normal kernel covariance matrix, which facilitates model identifiability while allowing for appropriate dependence structure. Moreover, this model structure has the computational advantage of simplifying the implementation of Markov Chain Monte Carlo posterior simulation. Next, we develop modeling and inference methods for ordinal regression, including the underdeveloped setting that involves multivariate ordinal responses. Standard parametric models for ordinal regression suffer from computational challenges arising from identifiability constraints and parameter estimation, whereas due to the flexible nature of the nonparametric model, we overcome these difficulties. The modeling approach is further developed to handle ordinal regressions which are indexed in discrete-time, through use of a dependent Dirichlet process prior, which estimates the unique regression relationship at each time point in a flexible way while incorporating dependence across time. We consider several examples involving synthetic data to study the scope of the proposed methodology with respect to inference and prediction under both standard and more complex scenarios for the underlying data generating mechanism. Moreover, a variety of real data examples are used to illustrate our methods. As this methodology is especially well-suited to problems in ecology and population dynamics, we target applications in these areas. In particular, our methods are used to provide a detailed analysis of a data set on rockfish maturity and body characteristics collected across different years.

In survival analysis interest lies in modeling data that describe the time to a particular event. Informative functions, namely the hazard function and mean residual life function, can be obtained from the model's distribution function. We focus on the mean residual life function which provides the expected remaining life given that the subject has survived (i.e., is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the mean residual life function characterizes the survival distribution. In terms of mean residual life function inference, there are two shortcomings present in the current literature. First off, the shape of the functional is often restricted, which forces the researcher to make an assumption that may not be appropriate. Secondly, in cases where the shape of the functional is not parametrically specified, full inference is not obtained. The aim of our research is to provide a modeling approach that yields full inference for the mean residual life function, and is not restrictive on the shape of the functional. In particular, we develop general Bayesian nonparametric modeling methods for inference for mean residual life functions built from a mixture model for the associated survival distribution. Although the prior model is not placed on the mean residual life function directly, our methods offer rich inference for the desired functional. We place a Dirichlet process mixture model on the survival function, and discuss the importance of careful kernel selection to ensure desirable properties for the mean residual life function. We advocate for a mixture model with a gamma kernel and dependent baseline distribution for the Dirichlet process prior. We extend our model to the regression setting by modeling the joint distribution for the survival response and random covariates. This approach provides a flexible method for obtaining inference for the regression functionals when the number of random covariates is small to moderate. We further extend our methods to the scenario where interest lies in comparison of survival between two experimental groups. Typically, we expect the range of survival in the two groups to be the same, but exhibiting different characteristics over that range. Here, we develop a dependent Dirichlet process prior for the mixing distributions having shared locations across the two groups and varying weights to incorporate dependency between populations and achieve richer inferential results. The final scenario we consider is the case in which the researcher believes two populations have ordered mean residual life functions. For such applications, a prior model that incorporates an ordering constraint on the mean residual life functions is attractive. We introduce a mixture of Erlang distributions with weights constructed using Dirichlet process priors that provides the mean residual life ordering result. We demonstrate the utility of our modeling methods through simulation and real data examples. In addition, we draw comparisons with both parametric and semiparametric models.

Model-based inferential methods for point processes have received less attention than the corresponding theory of point processes and is more scarcely developed than other areas of statistical inference.

Classical inferential methods for point processes include likelihood-based and nonparametric methods. Bayesian analysis provides simulation-based estimation of several statistics of interest for point processes. However, a challenge of Bayesian modeling, specifically for point processes, is selecting an appropriate parametric form for the intensity function. Bayesian nonparametric methods aim to avoid the narrow focus of parametric assumptions by imposing priors that can support the entire space of distributions and functions. It is naturally a more flexible and adaptable approach than those based on parametric models.

In this dissertation, we focus on developing methodology for some classes of temporal point processes modeling and inference in the context of Bayesian nonparametric methods, mainly with applications in environmental science.

Firstly, we are motivated to study seasonal marked point process by an application of

hurricanes occurrences. We develop nonparametric Bayesian methodology to study the dynamic

evolution of a seasonal marked point process intensity under the assumption that the point process is a non-homogeneous Poisson process. The dynamic model for time-varying intensities provides both the intra-seasonal and inter-seasonal variability of occurrences of events. Considering marks, we provide a full probabilistic model for the point process over the joint marks-points space which allows for different types of inferences, including full inference for dynamically evolving conditional mark densities given a time point, a particular time period, and even a subset of marks.

We apply this method to study the evolution of the intensity of the process of hurricane landfall occurrences, and the respective maximum wind speed and associated damages. We show several novel inferences which are explored for the first time in the analysis of hurricane occurrences.

Then we look beyond Poisson processes and propose a flexible approach to modeling and inference for homogeneous renewal processes.

This modeling method is based on a structured mixture of Erlang densities with common scale parameter for the renewal process inter-arrival density. The mixture weights are defined through an underlying distribution function modeled nonparametrically with a Dirichlet process prior. This model specification enables flexible shapes for the inter-arrival time density, including heavy tailed and multimodal densities. Moreover, the choice of the Dirichlet process centering distribution controls clustering or declustering patterns for the point process.

Finally we extend our model to accommodate point processes with time-varying inter-arrivals, which are referred to as modulated renewal processes in the literature. We introduce time dependence in the scale parameter of the Erlang mixture by replacing it with a latent stochastic process. A number of synthetic data sets and real data sets are used to illustrate the modeling approaches.

The main contribution of this thesis is to provide Bayesian nonparametric modeling and inference methods for some classes of point processes, which are more flexible than existing methods.

Moreover, the key complication for Bayesian inference is that the likelihood of a generic point process involves a normalizing constant which is, most of the times, analytically intractable. Discretization is very often used in existing methods to get likelihood approximations that facilitate computations, especially for models based on Gaussian process priors. Superior to these methods, our work uses the exact likelihood without approximation in all of our developed models.