Skip to main content
eScholarship
Open Access Publications from the University of California

Statistical Modeling of Marked Point Processes and (Ultra-)High Frequency Data

  • Author(s): Wen, Musen
  • Advisor(s): Lii, Keh-Shin
  • et al.
Abstract

The studies of stock transaction data, i.e., both the regularly-spaced high frequency data and the irregularly-spaced ultra-high frequency data, have been among the frontiers of modern financial data analysis. One of those data sets is the Trade and Quote (TAQ) data from the New York Stock Exchange (NYSE), which is a collection of all stock transaction information (e.g., the transaction date, time, prices and volumes, etc.) for every trading day. The analysis of the intraday transaction data still remains highly challenging today, especially on the statistical modeling aspects.

In this research, two new statistical modeling frameworks, namely, the Multi-Logit Mixture Autoregressive (MLMAR) models and the multivariate Mixture Transition Distribution (MMTD) models, are proposed respectively to handle above two types of financial data. The models are the univariate and multivariate generation of the MTD-type time series models.

The MLMAR time series model is a univariate time series model for the regularly-spaced intraday stock prices, which includes the exogenous information, such as the transaction volumes, the trading frequencies or any other market information, into the modeling framework. The MMTD model is a modeling framework for marked point processes in general, and ultra-high frequency transaction data in particular.

In both modeling frameworks, we solve a series of problems, which include the model specification, parameter estimation, prediction methodology and their applications to the stock transaction data. To show the capacity and advantage of the new models over the existing models, we also compare the new models with those benchmark models and show the new models' advantages in terms of either describing the underlying data generating process or prediction performance. For each class of time series model, potential extensions and related modeling issues are also discussed thereafter.

Main Content
Current View