Structural learning of Gaussian directed acyclic graphs (DAGs) or Bayesian networks has been studied extensively under the assumption that data are independent. But in real applications such as in biology and social studies observations generated from a Bayesian network model are often mutually dependent and their dependence can be model by a second network model.
In this dissertation, we generalize the existing Gaussian DAG framework by proposing a new Gaussian DAG model for dependent data which assumes the observations are correlated according to a given undirected network. Under this model, the dependent observations jointly follow a matrix normal distribution with variance represented by the Kronecker product of two positive definite matrices. The Cholesky factor of one of the matrices represent the DAG structure in the feature space while the other encodes the conditional independencies among the observations. We show that the proposed model also satisfies the desired score-equivalence property under common likelihood-based score functions.
Based on the proposed model, we develop a block coordinate descent algorithm to estimate the DAG structure given a topological ordering of the vertices. The proposed algorithm jointly estimates a sparse Bayesian network and the correlations among observations by optimizing a scoring function based on penalized likelihood. The algorithm is fast and can scale to networks with thousands of nodes. We also established finite-sample error bounds and large-sample consistency of the estimators. In particular, we show that under some mild conditions, the proposed method produces consistent estimators for the DAG structure and the sample covariances after one iteration. Extensive numerical experiments also demonstrate that by jointly estimating the DAG structure and the sample correlation, our method achieves much higher accuracy in structure learning than the competing algorithms. When the node ordering is unknown, through experiments on synthetic and real data, we show that our algorithm can be used to estimate the correlations between samples, with which we can de-correlate the dependent data to significantly improve the performance of classical DAG learning methods.