Network Reconstruction of Dynamic Biological Systems
- Author(s): Asadi, Behrang;
- et al.
Inference of network topology from experimental data is a central endeavor in biology, since knowledge of the underlying signaling mechanisms a requirement for understanding biological phenomena. As one of the most important tools in bioinformatics area, development of methods to reconstruct biological networks has attracted remarkable attention in the current decade. Integration of different data types can lead to remarkable improvements in our ability to identify the connectivity of different segments of networks and to predict events within a cellular system. Several recent studies used data integration to reconstruct biochemical networks and to build predictive models from large-scale datasets. In this dissertation we first prescribe directions to reconstruct biological networks based on data properties and priorities in terms of network reconstruction performance. We use experimentally measured and synthetic data sets to compare three popular methods--principal component regression (PCR), linear matrix inequalities (LMI), and Least Absolute Shrinkage and Selection Operator (LASSO)-- in terms of root-mean-squared-error (RMSE), average fractional error in the value of the coefficients, accuracy, sensitivity, specificity and the geometric mean of sensitivity and specificity. This comparison enables us to establish criteria for selection of an appropriate approach for network reconstruction based on a priori properties of experimental data. Reconstruction of biological and biochemical networks from large biological datasets is challenging when the data in question are dynamic. To contribute to this challenge, we also developed a new method, called Doubly Penalized Linear Absolute Shrinkage and Selection Operator (DPLASSO), for reconstruction of dynamic biological networks. DPLASSO consists of two components, statistical significance testing of model coefficients and penalized/constrained optimization. A partial least squares with statistical significance testing acts as a supervisory-level filter to extract the most informative components of the network from a dataset (Layer 1). Then, LASSO with extra weights on the smaller parameters identified in the first layer is employed to retain the main predictors and to set the smallest coefficients to zero (Layer 2). We illustrate that DPLASSO outperforms LASSO in terms of sensitivity, specificity and accuracy. Most of biological systems are nonlinear, therefore, expressing the network model in linear form may not be able to appropriately represent the real structure of the network or to predict the response of the network as accurately as a proper nonlinear model does. Accordingly, as another contribution we have introduced a novel method to reconstruct nonlinear biological networks. In this method, we use a quadratic nonlinear model as the representation of second-order Taylor series expansion of a nonlinear system around an arbitrary point of interest. We apply LASSO to shrink some of the small coefficients to zero. A statistical significance testing (t-test) will complete the parameter (network link) selection. We demonstrate that our proposed approach will lead to considerable improvements in predicting the response of the system and fair improvement in accuracy and sensitivity of the network identified