Skip to main content
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

An Approach to Multi-Population Mortality Modeling with Multi-Output Gaussian Process Regression


This dissertation is concerned with the application of a machine learning method within the spatial statistical framework to simultaneously model multiple longevity surfaces. Our work is motivated by the detailed, consistent, and high-quality collection of raw mortality datasets from the Human Mortality Database (HMD) and the Human Cause-of-Death Dababase (HCD). The HMD provides the overall mortality rates for more than 40 developed countries, while the HCD offers mortality rates for multiple lists of causes of death up to a dozen countries. Yet, only few stochastic models exist for handling more than two populations at a time. To bridge this gap, we propose the application of Multi-output Gaussian Process (MOGP) models within a joint spatial covariance framework that treats population as a factor covariate, explicitly capturing the cross-population dependence.

We first investigate approaches within MOGP framework to jointly model the aggregate mortality rates of multiple populations. The proposed models assume the commonality in the mortality experience of multiple populations, enabling data fusion across the populations. Through numerous illustrations, we demonstrate the features of these joint models and how they satisfy important criteria of multi-population mortality modeling. We showcase how MOGP framework offers a tractable and efficient way to jointly analyze up to 8--12 populations at a time.

The remaining of this dissertation devotes to develop MOGPs for jointly modeling the longevity surfaces of multiple causes of death in multi-population context. Several extensions are borrowed to enhance the model flexibility and scalability. One extension relaxes the commonality assumption between cause-specific mortality trends, allowing us to capture the spatial heterogeneity between causes. The second extension leverages the gridded structure in the mortality table to speed up the computation, gracefully handling beyond 15 populations in mortality dataset with multidimensional factor inputs.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View