Structure Learning of DAGs from Observational Data with Multivariate Spatial Processes and with Non-invertible Functional Relationships
- Author(s): Wang, Bingling
- Advisor(s): Banerjee, Sudipto;
- Zhou, Qing
- et al.
Directed acyclic graph (DAG) are widely used for modeling all kinds of relations and processes. Learning DAG structure from observational data is a challenging problem. In this dissertation, we develop novel methods for learning causal DAGs.
First we work on learning the DAG structure with predefined outcomes by a Bayesian hierarchical model, where the outcomes are ``mixed'' in the sense that some may be continuous, some binary and others may be counts. We concern with multivariate point-referenced data, where each location yields measurements on multiple variables and build a joint model by hierarchically building conditional distributions with different spatial processes embedded in each conditional distribution. Comparisons in simulation study and data analysis across different spatial models show that the performance is greatly improved by incorporating spatial processes.
The second problem we try to solve is the causal relation determination in the bivariate case, i.e. to determine the direction between X and Y. We restrict our attention to non-invertible causal relations between variables and develop a novel testing method for deciding the causal direction. We propose a bivariate nonlinear SEM, where the function f is non-invertible and identify the causal direction by testing for the non-invertibility of the function f. Piecewise linear function with two pieces can capture the change in functional relationships and is used for approximating the non-invertible function f.
Causal DAGs are usually not identifiable from observational data and most existing methods only estimate a DAG up to its equivalence class. For the situation with more than two variables, instead of applying the bivariate non-invertible causal model to all of the given variables directly, we avoid the exhaustive search over all possible causal structures by combining current linear structure learning algorithms with our non-invertible causal learning algorithm. In our implementation, an initial graph is first constructed using a structure learning algorithm, then updated by our non-invertible nonlinear causal learning (NNCL) algorithm. Extensive numerical comparisons show that our algorithms outperform existing DAG learning methods in identifying causal graphical structures. We illustrate the practical application of our methods using two real-world data sets: flow cytometry data set and ChIP-Seq gene expression data set.