Bayesian networks, with structure given by a directed acyclic graph (DAG), are a popular class of graphical models. In this dissertation, we develop two structure learning methods for Bayesian networks.
First we propose a score-based algorithm for discrete data that can incorporate experimental intervention for causal learning. Learning Bayesian networks from discrete or categorical data is particularly challenging, due to the large parameter space and the difficulty in searching for a sparse structure. In this thesis, we develop a maximum penalized likelihood method to tackle this problem. Instead of the commonly used multinomial distribution, we model the conditional distribution of a node given its parents by multi-logit regression, in which an edge is parameterized by a set of coefficient vectors with dummy variables encoding the levels of a node. To obtain a sparse DAG, a group norm penalty is employed, and a blockwise coordinate descent algorithm is developed to maximize the penalized likelihood subject to the acyclicity constraint of a DAG. When interventional data are available, our method constructs a causal network, in which a directed edge represents a causal relation. We apply our method to various simulated and real data sets. The results show that our method is very competitive, compared to many existing methods, in DAG estimation from both interventional and high-dimensional observational data. In addition, an R package discretecdAlgorithm for this algorithm is published on CRAN, with a user friendly interface.
The second method we propose is a framework for fast learning of extremely large Bayesian networks.
The size of the DAG space grows super-exponentially in the number of nodes, and it is challenging to develop an efficient structure learning method for massive networks. We propose a three step divide-and-conquer framework for Gaussian observational data:
1) partition the full DAG into several sub-networks (P-step),
2) estimate each disconnected sub-network individually (E-setp),
3) fuse sub-networks by adding edges between them (F-step).
To partition the full DAG into some sub-networks, a modified hierarchical clustering with average linkage is proposed to automatically choose the number of sub-networks. For the estimation step, users can apply any proper structure learning algorithm; in our implementation we choose to use the Concave penalized Coordinate Descent with reparameterization (CCDr) algorithm (Aragam and Zhou, 2015) as an example. Finally we develop a hybrid fusion step that uses both conditional independence tests and a penalized likelihood scoring function to recover the structure of the full DAG. The simulation results show that our three step framework had significant improvement in both speed and accuracy compared to estimating the DAG as a whole with the algorithm used in the second estimation step.