Regression Methods for Replicated Cox Point Process Data and Fréchet Manifold Learning.
- Gajardo Cataldo, Álvaro Eduardo
- Advisor(s): Müller, Hans-Georg
Abstract
Data in the form of temporal point processes has received increasing attention in the literature due to advances in technology that allow to record and store large amounts of information such as the individual random times at which an event of interest occurs. These processes describe natural random phenomena that arise in the form of event time arrivals over a time window of interest, and provide a natural modeling framework for complex data such as bike pickup arrivals at a bike station, COVID-19 cases in a country, earthquake aftershocks, arrivals of phone calls at a call center, among many others, where both the number of events along with the event times themselves are random. This type of data can then be naturally seen in the form of samples of temporal point processes, which has also been termed replicated point process data, as here the random phenomena is repeatedly generated like for example over each day in the bike pickups point process or across each country in the case of COVID-19 cases. When further information in the form of Euclidean vector covariates is coupled with the temporal point process, statistical methodology such as regression methods becomes of paramount importance to analyze and explore point process data. It is of great interest to see how such covariates are associated with these complex random objects.
The first chapter is devoted to the development of a fully non-parametric regression method for temporal Cox point processes as responses with Euclidean vectors as predictors (Gajardo and Müller, 2022), which is based on the recently proposed Fréchet regression framework. Cox point processes are of central importance in the point process theory as they are natural generalizations of the well known non-homogeneous Poisson process but where the intensity function is allowed to be random itself. The latter property allows to model replicated point process data as each replication then comes from an independent and identically distributed underlying intensity function and thus gives more flexibility for modelling repeated random phenomena. We derive theoretical convergence rates for the proposed regression function in an increasing in-fill intensity asymptotic framework, where individual intensities are allowed to suitably diverge with sample size and allows to consistently recover the density function where the event arrivals come from as well as the conditional intensity factor up to a common constant.
The statistical analysis of complex objects such as point processes can be seen in a more general framework of random objects taking values in a metric space. Often such objects are infinite-dimensional in nature and the underlying metric space may not be a linear vector space, which leads to challenges for statistical analysis. A useful modeling assumption is that such random objects lie in a low-dimensional manifold so that low-dimensional representations are sufficient to describe and analyze such objects; this framework is often referred to as manifold learning in the literature. In the second chapter, we develop an inverse map from the Euclidean low-dimensional representation of the random objects that is obtained by adopting the well known ISOMAP procedure along with Multidimensional Scaling (MDS) back to object space, where we assume the manifold is isometric to a subset of a low-dimensional Euclidean space. For this, we employ the recently proposed framework of Fréchet regression and derive rates of convergence for the proposed inverse map. For the case of probability distributions in 2-Wasserstein space as random objects, we show that the proposed inverse map allows interpretation of the effect that each MDS component has on the distributional data and moreover derive convergence results for the case when only an increasing sample of observations coming from each probability distribution is available.