Bayesian networks are a popular class of graphical models to encode conditional independence and causal relations among variables by directed acyclic graphs (DAGs). In this thesis, we focus on developing algorithms to estimate Bayesian network structures. We propose two structure learning methods, and both of them minimize regularized negative log-likelihood functions over the space of orderings.
First, we propose the annealing on regularized Cholesky score (ARCS) algorithm to learn Gaussian Bayesian networks. The scoring function of ARCS is derived from regularizing the Gaussian DAG likelihood, and its optimization is an alternative form of the sparse Cholesky decomposition, which depends on the choice of permutation (matrix) $P$. For this reason, we name our objective function the regularized Cholesky (RC) score of permutations. Essentially, minimizing the RC score is a joint optimization over a permutation $P$ and a lower triangular matrix $L$, because the acyclic constraint of DAGs has been translated into a strictly lower triangular matrix given a permutation. ARCS uses simulated annealing to search over the permutation space and an effective first order method, called the proximal gradient algorithm, to compute the optimal DAG that is compatible with $P$. Combined, the two approaches allow us to quickly and effectively search over the space of DAGs without the need to verify the acyclicity constraint or to enumerate possible parent sets given a candidate topological sort. The annealing aspect of the optimization is able to consistently improve the accuracy of DAGs learned by greedy and deterministic search algorithms. Through extensive numerical tests, ARCS has demonstrated high structure learning accuracy and outperformed existing methods by a great margin when using observational and experimental data to learn Gaussian DAGs. As a byproduct, ARCS can accurately estimate Gaussian covariance matrix, and it has achieved higher test likelihood than other covariance estimation methods. In terms of theoretical results, we establish the consistency of our RC score in estimating topological sorts and DAG structures in the large-sample limit.
The second method we propose is the distributed annealing on regularized likelihood score (DARLS) algorithm, which generalizes the ARCS algorithm to learn a flexible family of DAGs from distributed data. To the best of our knowledge, it is the first method that uses distributed optimization to learn causal structures from data stored over different machines. DARLS searches over the space of topological sorts with simulated annealing strategy for a high-scoring causal graph, where the optimal graphical structure compatible with a sort is found by a distributed optimization method. We show that the estimate sequence generated by the distributed optimization method converges to a global optimizer of the overall score computed on all data across local machines. Additionally, we propose generalized linear DAG models where the conditional distributions of a Bayesian network is given by generalized linear models (GLMs) with canonical links. GLMs is a flexible family of distributions that take various types of data, and thus the use of it greatly increase the applicability of our DAG models. In our simulation studies, DARLS has demonstrated competing performance with distributed data against other existing methods using the overall data across local machines. It also exhibits higher predictive power than other methods in a real-world application for modeling protein-DNA binding networks using ChIP-Sequencing data.