When faced with difficult problems, randomized approximations are a tool commonly employed by quantitative scientists which can offer alternative algorithms and analyses. In this dissertation, we consider a range of problems at the intersection of randomized methods and statistics. Throughout our work, randomized methods are a unifying theme used in the first section to construct tractable approximations and later as tools for automating computational Bayesian statistics.
Initially in the first three chapters, we develop novel analyses to recent problems in experimental design, double descent, and random projections by using determinantal point processes (DPP). We first consider approximately optimal Bayesian experimental design using an adaptive row sampling algorithm based on DPPs. In the subsequent chapters, we generalize the proof techniques to study double-descent in over-parameterized least squares linear regression and establish expectation formulae for sub-Gaussian matrix sketching.
In the later chapters we focus on probabilistic programming and developing theory and tools to automate statistical inference using randomized algorithms based on Monte Carlo Markov chain (MCMC) and variational inference (VI). We first consider lightweight inference compilation (LIC) whichcombines deep learning with MCMC through parameterizing proposers q(x) by graph neural networks which condition each node on its Markov blanket. The next chapter considers tail anisotropy in multivariate heavy tailed target densities p(x) and proposes fat-tailed variational inferene (FTVI) to approximate them. Our work concludes on the generalized Gamma algebra (GGA), where we address the analysis of fat tails during static analysis of a probabilistic program’s source code. This enables a priori computation of tail parameters, which we show improves the stability and convergence of a number of inference tasks.