Structure Learning of Linear Bayesian Networks in High-Dimensions
Research into graphical models is a rapidly developing enterprise, garnering significant interest from both the statistics and machine learning communities. A parallel thread in both communities has been the study of low-dimensional structures in high-dimensional models where $p\gg n$. Recently, there has been a surge of interest in connecting these threads in order to understand the behaviour of graphical models in high-dimensions. Due to their relative simplicity, undirected models such as the Gaussian graphical model and Ising models have received most of the attention, whereas directed graphical models have received comparatively little attention. An important yet largely unresolved class of directed graphical models are Bayesian networks, or directed acyclic graphs (DAGs). These models have a wide variety of applications in aritificial intelligence, machine learning, genetics, and computer vision, but estimation of Bayesian networks in high-dimensions is not well-understood. The main focus of this dissertation is to address some fundamental questions about these models in high-dimensions.
The primary goal is to develop both algorithms and theory for estimating continuous, linear Bayesian networks, capable of handling modern high-dimensional problems. Motivated by problems from the regression literature, we show how to adapt recent work in sparse learning and nonconvex optimization to the structure learning problem for Bayesian networks in order to estimate DAGs with several thousand nodes. We draw an explicit connection between linear Bayesian networks and so-called neighbourhood regression problems and show how this can be exploited in order to derive nonasymptotic performance bounds for penalized least squares estimators of directed graphical models.
On the algorithmic side, we develop a method for estimating Gaussian Bayesian networks based on convex reparametrization and cyclic coordinate descent. In contrast to recent methods which accelerate the learning problem by restricting the search space, we propose a method for score-based structure learning which does not restrict the search space. We do not require the existence of a so-called faithful DAG representation, and as a result, our methodology must handle the inherent nonidentifiability of the estimation problem in a novel way. On the theoretical side, we provide (a) Finite-dimensional performance guarantees for local minima of the resulting nonconvex program, and (b) A general high-dimensional framework for global minima of the nonconvex program. Both the algorithms and theory apply to a general class of regularizers, including the MCP, SCAD, $\ell_1$ and $\ell_0$ penalties. Finally, as a matter of independent interest, we provide a comprehensive comparison of our approach to several standard structure learning methods using open-source packages developed for the R language.