In this dissertation, I look at four distinct systems that all embody a similar challenge to modeling complex scenarios from noisy multidimensional historical data. In many scenarios, it is important to form an opinion, make a prediction, implement a business decision, or make an investment based upon expected future system behavior. All systems embody an amount of uncertainty, and quantify- ing that uncertainty using statistical methods, allows for better decision making. Three distinct scenarios are discussed with novel application of statistical methods to best quantify the endogenous uncertainty and aid in optimal decision making. Two chapters focus on predicting the winners of a horse race, one on predicting movement of a stock index, and the fourth on molding response from an online advertising effort.
The first horse racing chapter uses a hierarchical Bayesian approach to model- ing running speed, using a novel grouping of races into "profiles" and then pooling information across those profiles to improve predictive accuracy. The second horse racing chapter implements a novel conditional logistic regression that is modified by frailty parameter derived from winning payoff, and then regularized with a LASSO. High speed parallel algorithms, running on a GPU, were hand coded to optimize tuning LASSO parameters in rapid time.
The chapter on stock index prediction explores the application of ensemble filters. I show how an ensemble of filters on individual member stocks is a better predictor of index direction than a filter directly on the index.
The chapter on advertising explores how the clicks and sales from an Ad- Words campaign may be modeled with a re-paramaterized Beta distribution to better capture variance. Empirical data from a live campaign is studied, with a hierarchical Bayesian framework for brand features solved using a Metropolis within Gibbs algorithm.