Genes and Pathways for Plasma Lipid Traits via Multi-tissue Multi-omics Systems Analysis

Genome-wide association studies (GWAS) have implicated ∼380 genetic loci for plasma lipid regulation. However, these loci only explain 17-27% of the trait variance and a comprehensive understanding of the molecular mechanisms has not been achieved. In this study, we utilized an integrative genomics approach leveraging diverse genomic data from human populations to investigate whether genetic variants associated with various plasma lipid traits, namely total cholesterol (TC), high and low density lipoprotein cholesterol (HDL and LDL), and triglycerides (TG), from GWAS were concentrated on specific parts of tissue-specific gene regulatory networks. In addition to the expected lipid metabolism pathways, gene subnetworks involved in ‘interferon signaling’, ‘autoimmune/immune activation’, ‘visual transduction’, and ‘protein catabolism’ were significantly associated with all lipid traits. Additionally, we detected trait-specific subnetworks, including cadherin-associated subnetworks for LDL, glutathione metabolism for HDL, valine, leucine and isoleucine biosynthesis for TC, and insulin signaling and complement pathways for TG. Finally, utilizing gene-gene relations revealed by tissue-specific gene regulatory networks, we detected both known (e.g. APOH, APOA4, and ABCA1) and novel (e.g. F2 in adipose tissue) key regulator genes in these lipid-associated subnetworks. Knockdown of F2 gene (Coagulation Factor II, Thrombin) in 3T3-L1 adipocytes reduced gene expression of Abcb11, Apoa5, Apof, Gc, Fabp, Hrg, Proc, and Cd36, several of which are important in lipoprotein transport and fatty acid uptake, providing evidence for a link between adipose thrombin and plasma lipid regulation. Our results shed light on the complex mechanisms underlying lipid metabolism and highlight potential novel targets for lipid regulation and lipid-associated diseases.

enables us to derive a comprehensive view of the complex and novel mechanisms underlying plasma lipid 87

metabolism. 88
In addition to eQTLs and distance-based SNP-gene mapping approaches, we integrated functional data 136 sets from the Regulome database (20), which annotates SNPs in regulatory elements in the Homo sapiens 137 genome based on the results from the ENCODE studies (50). 138 Nine unique combinations of SNP-gene mapping 139 Using the above three mapping approaches, we derived nine unique sets of SNP-gene mapping. These are: 140 eSNP adipose, eSNP liver, eSNP blood, eSNP brain, eSNP HAEC, eSNP all (i.e., combining all the 141 tissue-specific eSNPs above), Distance (chromosomal distance-based mapping), Regulome (ENCODE-142 based mapping), and Combined (combining all the above methods). 143

Removal of SNPs in linkage disequilibrium 144
We observed a high degree of linkage disequilibrium (LD) in the eQTL, Regulome, and distance-based 145 SNPs, and this LD structure may cause artifacts and biases in the downstream analysis. For this reason, 146 we devised an algorithm to remove SNPs in LD while preferentially keeping those with a strong 147 statistical association with lipid traits. Technical details are available in Supplementary Methods. We 148 chose a LD cutoff (R 2 < 0.5) to remove redundant SNPs in high LD. 149

Marker Set Enrichment Analysis (MSEA) 150
We applied a modified MSEA method (24, 51) to find pathways/co-expressed modules associated with 151 lipid traits (Supplemental Methods). False discovery rates (FDR) were estimated with the method by 152 Benjamini and Hochberg (52). Pathways or co-expression modules with FDR < 10% were considered 153 statistically significant. MSEA were applied to both the GLGC GWAS dataset and the MetaboChip 154 dataset. The combined FDR from these two datasets was expected to be < 1% (10% * 10% = 1%). 155 Comparison between MSEA and other computational method 156 enrichment analysis approach (iGSEA) (53). In the iGSEA analysis, we generated gene sets using the 158 same canonical pathways and co-expression modules in MSEA. The SNPs were mapped to genes using 159 the default settings of iGSEA. For each given gene set, significance proportion-based enrichment score 160 was calculated to estimate the enrichment of genotype-phenotype association. Then, iGSEA performed 161 label permutations to calculate nominal P-values to assess the significance of the pathway-based 162 enrichment score and FDR to correct multiple testing, with FDR < 0.25 (default setting) regarded as 163 significant pathways. Considering that MSEA and iGSEA were independent, the combined FDR from 164 these two methods of analysis was expected to be < 5% (10% x 25% = 2.5%). 165

Construction of independent supersets and confirmation of lipid association 166
Because the pathways or co-expression modules were collected from multiple sources, there were 167 overlapping or nested structures among the gene sets. To make the results more meaningful, we 168 constructed relatively independent supersets that captured the core genes from groups of redundant 169 pathways and co-expression modules (Supplemental Methods). After merging, we annotated each 170 superset based on function enrichment analysis of the known pathways from the Gene Ontology and 171 KEGG databases (P < 0.05 in Fisher's exact test after Bonferroni correction). The supersets were given a 172 second round of MSEA to confirm their significant associated with lipids using P < 0.05 after Bonferroni 173 correction as the cutoff. 174

Key driver analysis (KDA) 175
We adopted a previously developed KDA algorithm (54-56) of gene-gene interaction networks to the 176 lipid-associated supersets in order to identify the key regulatory genes (Figure 1). In the study, we 177 included Bayesian regulatory networks from diverse tissues, including adipose tissue, liver, blood, brain, 178 kidney and muscle (31-39). A key driver was defined as a gene that is directionally connected to a large 179 number of genes from a lipid superset, compared to the expected number for a randomly selected gene 180 within the Bayesian network (details in Supplemental Methods). The MSEA, merging, and KDA were 181 performed using R. 182

3T3-L1 cell culture and transient transfection with F2 siRNA 190
The mouse preadipocytes 3T3-L1 cells were obtained from ATCC and maintained and differentiated to 191

Identification of pathways and gene co-expression modules associated with lipid traits 223
To asses biological pathway enrichment for the four lipid traits with GLGC GWAS, we curated a 224 total of 4532 gene sets including 2705 tissue-specific co-expression modules (i.e., highly co-regulated 225 genes based on tissue gene expression data) and 1827 canonical pathways from Reactome, Biocarta and 226 the Kyoto Encyclopedia of Genes and Genomes (KEGG). These gene sets were constructed as data-and 227 knowledge-driven functional units of genes. Four predefined positive control gene sets for HDL, LDL, 228 TC, and TG were also created based on candidate genes curated from the GWAS catalog (58). To map 229 potential functional SNPs to genes in each gene set, tissue-specific eQTLs, ENCODE functional 230 genomics information, and chromosomal distance-based mapping were used (details in Methods). Tissue-231 specific eQTL sets were obtained from the GTEx database from studies on human adipose tissue, liver, 232 brain, blood, and human aortic endothelial cells (HAEC), and a total of nine SNP-gene mapping methods 233 were created. The liver and adipose tissues have established roles in lipid regulation, whereas the other 234 tissues are included for comparison. 235 Integrating the datasets mentioned above using MSEA, we identified 65, 86, 90, and 92 gene sets 236 whose functional genetic polymorphisms showed significant association with HDL, LDL, TC, and TG, 237 respectively, in GLGC GWAS (FDR < 10%; Supplemental Table S1). The predefined positive controls 238 for the four lipid traits were among the top signals for their corresponding traits (Table 1), indicating that 239 our MSEA method is sensitive in detecting true lipid trait-associated processes. Compared with other 240 tissues, more pathways were captured when using liver and adipose eSNPs to map GWAS SNPs to genes 241 (Supplemental Table S1). For example, 56 out of the 86 LDL-associated pathways were found when 242 liver and adipose eSNPs were used in our analysis. These results confirmed the general notion that liver 243 and adipose tissue play critical roles in regulating plasma lipids, leading us to focus the bulk of our 244 analysis on these two tissues, with the remaining tissues serving as supplement. 245 represented the expected lipid metabolic pathways as well as those that are less known to be associated 247 with lipids, such as 'antigen processing and presentation', 'cell adhesion molecules (CAMs)', 'visual 248 phototransduction', and 'IL-5 signaling pathway' (summary in Table 1; details in Supplemental Table  249 S1). We broadly classified the common gene sets detected into 'positive controls', 'lipid metabolism', 250 'interferon signaling', 'autoimmune/immune activation', 'visual transduction', and 'protein catabolism' 251 (Table 1). 252 Beside the common gene sets described above, we also detected 18, 5, 6, and 17 trait-specific 253 pathways/modules for HDL, LDL, TC, and TG, respectively ( Table 2; Supplement Table S1), 254 suggesting trait-specific regulatory mechanisms. Among the 18 pathways for HDL were 'cation-coupled 255 chloride transporters', 'glycerolipid metabolism' and 'negative regulators of RIG-I/MDA5 signaling' 256 across analyses using different tissue eSNP mapping methods, 'alcohol metabolism' from brain-based 257 analysis, 'packaging of telomere ends' in adipose tissue, 'glutathione metabolism' in liver, and 258 'cobalamin metabolism' and 'taurine and hypotaurine metabolism' in both adipose and liver-based 259 analyses. LDL-specific pathways included the 'platelet sensitization by LDL' pathway and a liver co-260 expression module related to cadherin. TC-specific pathways included 'valine, leucine and isoleucine 261 biosynthesis' across tissues and 'wound healing' in the brain-based analysis. When looking at the TG-262 specific pathways, gene sets associated with 'cellular junctions' were consistent across tissues whereas 263 'insulin signaling' and complement pathways were exclusively seen in adipose tissue-based analysis. 264

Replication of lipid-associated pathways using additional dataset and method 265
To replicate our results from the analysis of GLGC GWAS datasets, we utilized an additional lipid 266 genetic association dataset based on a MetaboChip lipid association study (15) which involved 267 individuals independent of those included in GLGC. The gene sets detected using this independent dataset 268 highly overlapped with those from the GLGC dataset (Table 1

Construction of non-overlapping gene supersets for lipid traits 273
As the knowledge-based pathways and data-driven co-expression modules used in our analysis can 274 converge on similar functional gene units, some of the lipid-associated gene sets have redundancies. We 275 therefore merged overlapping pathways to derive independent, non-overlapping gene sets associated lipid 276 traits. For the 39 shared pathways/co-expression modules across the four lipid traits described earlier, we 277 merged and functionally categorized them into five independent supersets ( Table 1; Table 3). For the 278 significant gene sets for each lipid trait, we merged them into 17, 16, 18, and 14 supersets for HDL, LDL, 279 TC, and TG, respectively (Table 3), and confirmed that the merged supersets still showed significant 280 association with the corresponding lipid traits in a second round of MSEA (p < 0.05 after Bonferroni 281 correction for the number of supersets tested; Table 3). 282

Identification of central regulatory genes in the lipid-associated supersets 283
Subsequently, we performed a key driver analysis (KDA; Figure 1) to identify potential regulatory genes 284 or key drivers (KDs) that may regulate genes associated with each lipid trait using Bayesian networks 285 constructed from genetic and gene expression datasets of multiple tissues (detailed in Methods; full KD 286 list in Supplemental Table S3). The top adipose and liver KDs for the shared supersets of all four lipid 287 traits and the representative Bayesian subnetworks are shown in Figure 2. 288 In adipose tissue (Figure 2A), the top KDs for the 'lipid metabolism' subnetwork include well-289 known lipoproteins and ATP-binding cassette (ABC) family members that are responsible for lipid 290 transport, such as APOF, APOA5 and ABCB11. We also found several KDs that are less known to be 291 associated with lipid metabolism, particularly F2 (coagulation Factor II or thrombin). For the 292 'autoimmune/immune activation' subnetwork, CD86, HCK, and HLA-DMB were identified as KDs. 293 signaling' subnetwork. Moreover, the SYK gene is a shared KD between 'lipid metabolism' and 295 'autoimmune/immune activation'. 296 In the liver (Figure 2B), the top liver KDs for the 'lipid metabolism' subnetwork are enzymes 297 involved in lipid and cholesterol biosynthesis and metabolism, such as FADS1 (fatty acid desaturase 1), 298 FDFT1 (farnesyl-diphosphate farnesyltransferase 1), HMGCS1 (3-hydroxy-3-methylglutaryl-CoA 299 synthase 1), and DHCR7 (7-dehydrocholesterol reductase). We also identified more KDs for the 300 'interferon signaling' subnetwork in the liver compared to the adipose tissue, with MX1, MX2, ISG15, 301 IFI44, and EPSTI1 being central to the subnetwork. Similar to the adipose network, PSMB9 and HLA-302 DMB were also identified as KDs for 'protein catabolism' and 'autoimmune/immune activation' 303 subnetworks in liver, respectively. We did not detect key driver genes for the 'visual transduction' 304 subnetwork in either tissue, possibly as a result that the networks of liver and adipose tissues did not 305 capture gene-gene interactions important for this subnetwork. 306 In addition to the KDs for the subnetworks shared across lipid traits as discussed above, we 307 identified tissue-specific KDs for individual lipid traits (Supplemental Table S3). In adipose, PANK1 308 and H2B histone family members were specific for the HDL subnetworks ( Figure 3A); HIPK2 and FAU 309 were top KDs for the LDL subnetworks ( Figure 3B); genes associated with blood coagulation such as 310 KNG1 and FGL1 were KDs for the TC and TG subnetworks (Figure 3C-3D). Interestingly, genes related 311 to insulin resistance, PPARG and FASN, were KDs for both HDL and TG subnetworks. Similarly, trait-312 specific KDs and subnetworks were also detected in the liver; 37 KDs were identified for the TG 313 subnetwork including ALDH3B1 and ORM2, whereas AHSG, FETUB, ITIH1, HP, and SERPINC1 were 314 KDs found in the LDL subnetwork. We note that most of the KDs are themselves not necessarily GWAS 315  Taking into account that the F2 gene is surrounded by various significant GWAS hits within its 322 subnetwork, we aimed to validate the role of the F2 gene subnetwork in lipid regulation through a siRNA-323 mediated knockdown experiment in 3T3-L1 adipocytes. We found that F2 gene expression was low in 324 preadipocytes, but gradually increased according to adipogenesis. In fully differentiated adipocytes at day 325 10, F2 gene expression level was higher than preadipocytes by 12-fold (Supplemental Figure S3). adipocytes was observed compared with scrambled siRNA control ( Figure 4B). Next, we measured 335 expression levels of genes related with adipogenesis (Pparr, Cepba, Srepb1, Fas), lipolysis (Lipe), fatty 336 acid transport (Cd36, Fabp4), and other adipokines following F2 siRNA treatment. We found no change 337 in the expression of these tested genes, with the exception of Cd36, which encodes fatty acid translocase 338 facilitating fatty acid uptake. Cd36 expression was decreased by 15 % (p < 0.05) compared with control 339 (Figure 4C). The decreases in Cd36 and lipoprotein transport genes after F2 knockdown suggest that 340 cholesterol and fatty acid transport/uptake by adipocytes is compromised, which could contribute to 341 alterations in circulating lipid levels. 342 Epidemiological studies consistently show that plasma lipids are closely associated with human 344 complex diseases. For example, high TC and LDL levels are associated with increased risk of 345 cardiovascular disease (CVD). Here, we examined the association between the lipid subnetworks 346 identified in our study and four human complex diseases, namely, Alzheimer's disease, CVD, T2D, and 347 cancer (Materials and Methods). We found that the gene supersets identified for each lipid traits were 348 significantly enriched for GWAS candidate genes reported by GWAS catalog for the four diseases at 349 Bonferroni-corrected p < 0.05 (Figure 5; Supplemental Table S4). The superset 'lipid metabolism', 350 which was shared across lipid traits, was associated with Alzheimer's disease and CVD. When trait-351 specific subnetworks were considered, those associated with TC, LDL, and TG had more supersets 352 associated with CVD compared to those associated with HDL, a finding consistent with recent reports (15, 353 65, 66). In addition, supersets of each lipid trait, except HDL, were also found to be significantly 354 associated with cancer, whereas supersets associated with HDL, LDL, and TG but not TC, were linked to 355

Discussion 365
To gain comprehensive insights into the molecular mechanisms of lipid traits that are important for 366 numerous common complex diseases, we leveraged the large volume of genomic datasets and performed 367 a data-driven multi-omics study combining genetic association signals from large lipid GWAS, tissue-368 specific eQTLs, ENCODE functional data, known biological pathways, and gene regulatory networks. 369 We identified diverse sets of biological processes, guided by their tissue-specific gene-gene interactions,  We identified shared pathways associated with all four lipid traits, including 'lipid metabolism' 377 and 'autoimmune/immune activation', which have been consistently linked to lipid phenotypes, as well as 378 additional pathways such as 'interferon signaling', 'protein catabolism', and 'visual transduction'. 379 Interferon factors have previously been linked to lipid storage attenuation and differentiation in human 380 adipocytes (67). Protein catabolism has only recently been identified to be important in regulating lipid 381 metabolism through the PSMD9 protein, which had no previously known function but was shown to 382 cause significant alterations in lipid abundance in both a gain of function and loss of function study in 383 mice (68). The 'visual transduction' superset contains retinol-binding proteins, which are carrier proteins 384 involved retinol transport, and play key roles in gene expression regulation and developmental processes 385 (69). 'visual transduction' also shares lipoprotein genes with 'lipid metabolism', suggesting that retinol-386 related signal transduction is intimately linked to lipoprotein transport and hence plasma lipid levels. 387 most TG-specific pathways were found to be significant when adipose eSNPs were used, and complement 389 and insulin signaling pathways in the adipose tissue were specific for TG. This is in line with adipose 390 tissue functioning as the major storage site for TG and the regulatory role of immune system and insulin 391 signaling in adipocyte functions and fat storage (70). We also found five HDL-specific pathways, most of 392 which are associated with glucose, lipid, and amino acid metabolism, and were signals derived from liver 393 eSNPs. As HDL acts as the major vehicle for transporting cholesterol to the liver for excretion and 394 catabolism, the critical role of liver as well as the connections between major metabolic pathways in HDL 395 regulation is recapitulated by our analysis. Interestingly, the TC-specific pathways can be only found 396 when brain eSNPs are used. While the brain accounts for 2% of body weight, it contains 23% of TC in the 397 body (71) and deregulated cholesterol trafficking appears to be involved in the pathogenesis of 398 neurodegenerative diseases, such as Parkinson's and Alzheimer's disease (72). These tissue-and trait-399 specific pathways or processes support the unique features of each lipid species and point to tissue-400 specific targeting strategies to modulate levels of individual lipid traits and the associated diseases. 401 In addition to detecting trait-and tissue-specific causal pathways for the lipid traits, our study 402 attempted to delineate the interactions between lipid genes and pathways through gene network analysis. 403 Indeed, the tissue-specific gene networks revealed in our study highlight the regulatory connections 404 between lipid genes and pathways, and thus put individual genes in a broader context. The identification 405 of KDs in a network is essential for uncovering key regulatory components and for identifying drug 406 targets and biomarkers for complex diseases (24, 73). Here we adopted data-driven Bayesian gene 407 regulatory networks that combine various genomic data (54) to detect the central genes in plasma lipid 408 regulation. The power of this data-driven objective approach has been demonstrated recently (24, 55, 62, 409 63, 74, 75) and is again supported in this study by the fact that many KDs detected are known regulators surprising. However, the intimate relationship between a coagulation gene F2 and lipid regulation 420 predicted by our analysis is intriguing (Figure 4). We found that the partner genes in the adipose F2 421 subnetwork tend to be differentially expressed after In addition to shared KDs such as F2 for different lipids, it may be also of value to focus on the 427 trait-specific KDs as numerous studies have revealed these lipid phenotypes play different roles in many 428 human diseases. For example, LDL and TC are important risk factors for CVD (79) and TG has been 429 linked to T2D (80), while the role of HDL in CVD has been controversial (81). We detected 37 genes as 430 TG-specific KDs in liver regulatory subnetworks. Among these, CP (ceruloplasmin) and ALDH3B1 431 (aldehyde dehydrogenase 3 family, member B1) were clinically confirmed to be associated with T2D (82, 432 83) while most of the other genes such as DHODH and ANXA4 were less known to be associated with TG 433 and thus may serve as novel targets. In adipose tissue, genes important for insulin resistance and diabetes 434 such as PPARG and FASN were found to be KDs for TG, further supporting the connection between TG 435 and diabetes. Additionally, FASN has been implicated as a KD in numerous studies for non-alcoholic 436

disorders. 438
We acknowledge some potential limitations to our study. First, the GWAS datasets utilized are 439 not the most recently conducted and therefore provides the possibility of not capturing the full array of 440 unknown biology. However, despite this our results are consistent with the biology found more recently 441 including overlapping signals in pathways for chylomicron-mediated lipid transport and lipoprotein 442 metabolism (85) as well as more novel findings such as visual transduction pathways. In addition, one of 443 our key drivers KLKB1, which was not found to be a GWAS hit in the dataset we utilized, has since been 444 found to pass the genome wide significance threshold in more recent larger GWAS and is a hit on 445 apolipoprotein A-IV concentrations, which is a major component of HDL and chylomicron particles 446 important in reverse cholesterol transport (86). This further exemplifies the robustness of our integrative 447 network approach to find key genes important to disease pathogenesis even when smaller GWAS were 448

utilized. 449
In summary, we used an integrative genomics framework to leverage a multitude of genetic and 450 genomic datasets from human studies to unravel the underlying regulatory processes involved in lipid 451 phenotypes. We not only detected shared processes and gene regulatory networks among different lipid 452 traits, but also provide comprehensive insight into trait-specific pathways and networks. The results 453 suggest there are both shared and distinct mechanisms underlying very closely related lipid phenotypes. 454 The tissue-specific networks and KDs identified in our study shed light on molecular mechanisms 455 involved in lipid homeostasis. If validated in additional population genetic and mechanistic studies, these 456 molecular processes and genes can be used as novel targets for the treatment of lipid-associated disorders 457 such as CVD, T2D, Alzheimer's disease and cancers. 458

Data Availability 461
All genomic data utilized in the analysis were previously published and were downloaded from public 462 data repositories. All experimental data were presented in the current manuscript. Mergeomics code is   YES YES Note: *: The trait columns represent in which methods the MSEA of the pathways is significant with FDR 710 < 10%. Number 1 to 9 represent: adipose eSNP (1), blood eSNP (2), brain eSNP (3), human aortic 711 endothelial cells (HAEC) eSNP (4), liver eSNP (5), all eSNP (6), Distance (7), Regulome (8), and 712 Combined (9), respectively. The Metabochip and iGSEA columns tell whether the gene set can also be 713 detected as statistically significant in the analysis.  Note: a The method column represents in which methods the MSEA of the pathways is significant with 739 Bonferroni-adjusted P<0.05. Number 1 to 9 represent: adipose eSNP, blood eSNP, brain eSNP, haec 740 eSNP, liver eSNP, all eSNP, Distance, Regulome, and Combined, respectively.