Identifying gene regulatory networks in schizophrenia

The imaging genetics approach to studying thegenetic basisofdisease leverages theindividualstrengthsofboth neuroimaging and genetic studies by visualizing and quantifying the brain activation patterns in the context of geneticbackground.Brainimagingasanintermediatephenotypecanhelpclarifythefunctionallinkamonggenes, themolecularnetworksinwhichtheyparticipate,andbraincircuitryandfunction.Integratinggeneticdatafroma genome-wide association study (GWAS) with brain imaging as a quantitative trait (QT) phenotype can increase thestatisticalpower toidentifyriskgenes. AQTanalysis usingbrain imaging(DLPFC activation duringa working memory task) as a quantitative trait has identi ﬁ ed unanticipated risk genes for schizophrenia. Several of these genes (RSRC1, ARHGAP18, ROBO1-ROBO2, GPC1, TNIK, and CTXN3-SLC12A2) have functions related to progenitor cell proliferation, migration, and differentiation, cytoskeleton reorganization, axonal connectivity, and development of forebrain structures. These genes, however, do not function in isolation but rather through generegulatorynetworks.ToobtainadeeperunderstandinghowtheGWAS-identi ﬁ edgenesparticipateinlarger gene regulatory networks, we measured correlations among transcript levels in the mouse and human postmortem tissue and performed a gene set enrichment analysis (GSEA) that identi ﬁ ed several microRNA associated with schizophrenia (448, 218, 137). The results of such computational approaches can be further validated in animal experiments in which the networks are experimentally studied and perturbed with speci ﬁ c compounds. Glypican 1 and FGF17 mouse models for example, can be used to study such gene regulatory networks. The model demonstrates epistatic interactions between FGF and glypican on brain development and may be a useful model of negative symptom schizophrenia.


Introduction
Genome-wide association studies (GWAS) to date have not provided a genetic "smoking gun" for schizophrenia (Harrison and Weinberger, 2005). Given the known genetic components in the disorder, we hypothesize that it, and indeed most, complex psychiatric disorders arise through small contributions from many polymorphic loci, rather than through disruption of single genes or pathways. Certainly, GWAS results from schizophrenia, bipolar disorder, and major depression studiesin which many loci are found, each with small effect (e.g., Committee, 2009;Moskvina et al., 2008)support such a view. In this sense, we see psychiatric disorders as system-level disruptions of what are large, complex, nonlinear networks of gene, protein, and cell interactions. While GWAS results may tell us which genes in this network happen to have disease-associated polymorphisms of reasonably high frequency in the human population, it only gives us tiny glimpses of the underlying functional network itself.
Moreover, GWAS techniques create statistical challenges producing anywhere from 100,000 to more than 5,000,000 genotypes per subject (Potkin et al., 2009d). Classical statistical analytical techniques are not designed for situations where the number of variables so grossly outnumbers the number of subjects. In addition to the problem of multiple testing/multiple hypotheses, there are several other important issues that are the current focus of interest in statistical genetics. Examples of active statistical research are as follows: (1) how to analyze "genes" (or chromosomal regions) rather than SNPs, given our primary interest in mapping putative functional elements of the genome rather than simple point variations; (2) how to address gene × gene epistatic interactions (Brzustowicz, 2008;Chapman and Clayton, 2007;Evans et al., 2006;Jiang et al., 2009;Moore, 2008), or gene × environment epigenetic (Clayton and McKeigue, 2001;Glazier et al., 2002;Hoffmann et al., 2009;Lander and Kruglyak, 1995) interactions; or (3) even how to validate a causative or regulatory network (Barabasi, 2007;Hidalgo et al., 2009). Identification and application of the networks is the focus of this paper, using schizophrenia as an example.
Using brain imaging as a quantitative trait greatly increases the statistical power of GWAS (as in Potkin et al., 2009d). Neuroimaging as a quantitative trait may identify dimensions of brain function that are more closely related to susceptibility genes than are more subjective assessments of clinical symptoms or features (e.g. Gottesman and Gould, 2003). While imaging studies by themselves reveal many aspects of function and dysfunction in neuropsychiatric disorders, their explanatory power may be limited by not considering the genetic basis of brain structure and function, as both are clearly heritable (Kennedy et al., 2003). Integrating genotypic information with brain imaging results can help identify the function of candidate genes at the level of brain function (e.g. Meyer-Lindenberg and Weinberger, 2006).
However, the availability of high-throughput genotyping technologies and genomic resources such as HapMap (www.hapmap.org) has made it possible to survey SNP markers throughout the entire genome and increase the probability of discovering important unanticipated genetic influences. This allows imaging genetics to perform gene discovery-identification of new "candidate" genes related to brain function that would not be discovered by traditional candidate gene approaches (e.g. Papassotiropoulos et al., 2006;Potkin et al., 2009b, c, d, e;Shen et al., 2009).
Our imaging genetics GWAS approach uses brain imaging as a quantitative trait (QT) and determines which genes affect the QT, employing a reverse strategy compared to a candidate gene approach.
We do not test a priori hypotheses regarding genetic effects on brain function based on current physiological or pathophysiological knowledge since a major limitation of the candidate gene approach is precisely that we know our current understanding of physiology or pathophysiology is woefully incomplete (Meyer-Lindenberg and Weinberger, 2006;Roffman et al., 2006). Imaging Genetics can visualize brain activation patterns in the context of a whole genome background, thereby synergizing the strengths of each individual approach (Potkin et al., 2009b, c, e) and ultimately representing a strategy for risk gene discovery. Fig. 1 depicts our approach: In the pathway from SNP to disease, the human GWAS results can reveal a link between the SNP or gene and the phenotype but do not illuminate the causative networks of interacting mechanisms. Small but significant differences in allelic frequency in cases and controls are best not interpreted as a particular gene being the causative factor but instead should be considered to implicate a larger network of genes. A systems biology approach can improve our understanding of the implications of these GWAS findings of relatively small effects within these interacting mechanisms.
As an illustration of this approach, we have used activation in the dorsolateral prefrontal cortex (DLPFC) of the middle frontal gyrus, measured using fMRI during a working memory task, as a QT phenotype to identify genes related to schizophrenia that were not anticipated a priori to this study (Potkin et al., 2009b;Potkin et al., 2009c). Employing this approach, we have identified several genes related to brain development and stress that had never before been associated with schizophrenia. This is in line with the well-known common variantscommon disease (CVCD) hypothesis that schizophrenia arises through small contributions from many polymorphic loci rather than through disruption of single genes. Building on these results, we present several computational biology approaches that provide initial steps in identifying putative gene regulatory networks.
Gene regulatory networks can potentially be inferred from expression profiles, the locations of regulatory motifs, and interactions between regulatory targets and MicroRNA (miRNA). A number of methods have been proposed to infer gene regulatory networks from large-scale gene expression data (Eisen et al., 1998). A basic assumption underlying all these methods is that genes interacting together are correlated in their gene expression (positive or negative). Therefore, correlation in gene expression can be used as a measure for inferring gene interactions, using methods such as Boolean network analysis, informatics-based approaches, linear regression, or Bayesian networks (Bansal et al., 2007).
An additional approach for constructing gene interaction networks takes advantage of sequence analysis by searching for the locations of regulatory motifs in the human genome. Previously, we have demonstrated the power of comparative genomics for discovering novel regulatory motifs and for identifying individual regulatory motif sites in the human genome (Xie et al., 2005;Xie et al., 2007). Recent availability of over 25 placental mammalian genomes significantly boosts our power for detecting motif sites in the human genome. While these genomes are closely related to each other and likely share basic regulatory motifs, they are carefully chosen to represent distinct branches of the mammalian evolutionary tree. As such, they are ideal for separating regulatory sequences from neutral sequences (Margulies et al., 2005).
The indirect interaction between regulators and targets of miRNAs can also be determined. MicroRNAs (miRNAs), another important class of regulators of gene expression, are endogenous ∼22-nucleotide RNAs that repress gene expression post-transcriptionally (Carthew, 2006). miRNAs are believed to regulate thousands of genes by virtue of base pairing to 3′ untranslated regions (3′UTRs). Individual miRNAs can each affect hundreds of genes. Many of the characterized miRNAs are involved in developmental regulation, including the timing and neuronal asymmetry in worm; brain morphogenesis in zebrafish; and dendritic spine development in mammals (Giraldez et al., 2005;Schratt et al., 2006). Based on a recent survey (Griffiths-Jones et al., 2008), we note that the human genome contains over 500 miRNA genes, many of which are highly or specifically expressed in neural tissues. The function of the brain-related miRNAs and the mechanisms underlying their transcriptional control are beginning to emerge and miRNA expression differences have been found in the frontal and temporal gyri of schizophrenia patients (Beveridge et al., 2008).
These statistical approaches, however, require validation by experimental models. For example, the function of RSRC1 and AHRGAP18 in schizophrenia that we identified through imaging genetics (Potkin et al., 2009b) is largely unidentified and consequently the modification of function that may be therapeutic is unknown. However, perturbation of these candidate genes with compounds that affect genetic expression in animal neurodevelopmental models can help us understand the gene regulatory networks. These in vivo models improve our understanding of the biological significance of the networks identified by the bioinformatic approaches.
We demonstrate the use of existing and novel algorithms to infer gene regulatory networks from heterogeneous data sets (e.g. literature, gene expression and genomic sequences) by employing a gene setbased approach for GWAS data instead of individual SNPs or genes (Subramanian et al., 2005). We focus on the specific problem of placing loci identified through GWAS into the context of meaningful networks, the network of genes that directly or indirectly control, or are controlled by, schizophrenia risk genes, and the implication of selected genes identified from these analyses in a mouse model. Improving our understanding of the dynamic networks that underlie these disorders is key to developing interventions that restore the network to its normal regulatory state.

Quantitative trait
The full details of the neuroimaging data collection and analysis are available in Potkin et al. (2009b,c). We used measures of blood oxygenation level-dependent (BOLD) fMRI signal in the dorsal lateral prefrontal cortex (DLPFC) during the Sternberg Item Recognition Paradigm (SIRP), a heritable (Karlsgodt et al., 2007) working memory task, as a QT in a group of schizophrenia (n = 24 + 64) and healthy control subjects (n = 74) (Potkin et al., 2009b;Potkin et al., 2009c) in the context of GWAS to identify genes related to schizophrenia using a GWAS approach. The brain imaging phenotype was chosen based on its relevance to the neuropsychiatric disorder, e.g. DPLFC activation in the case of schizophrenia. The BOLD signal obtained during the probe condition while holding 3 items in memory contrasted with a one item memorandum was used as a quantitative phenotype in a GWAS.

GWAS analysis
Subjects were genotyped using the Illumina HumanHap370-Duo, providing 370,404 SNPs suitable for later analysis with the fMRI QT (Potkin et al., 2009b). All SNPs that passed quality control checks (Teo, 2008) were included in the GWAS analysis. The simplest model we applied was a general linear model (GLM) identifying the effects of SNP alleles or genotypes on the QT, thus determining how genetic variation can be related to phenotypes characterized by brain activation. With our model, we can determine the genes (SNPs) that effect brain activation (or structure) and then determine if these genetic effects differ by diagnosis, simply adding a term to the GLM as follows: This model can include other variables, for example, nuisance covariates such as the site from which the subject was recruited, their age or gender, etc. To guard against false positives, loci for further consideration were identified by at least 2 independent SNPs with a P b 10 −6 because the conjunction of these results is less likely than a single result alone. See (Potkin et al., 2009d) for discussion of GWAS statistical significance thresholds. This is in keeping with the WTCCC and O'Donovan et al. (2009), considering genome-wide thresholds of P b 10 − 5 and 10 − 7 as "moderately strong" and "strong" evidence for an association.
The newly identified risk-producing genes may be involved in pathophysiological neuronal networks; their putative role within their larger genetic networks was initially determined by bioinformatics and computational biology methods.

Gene network identification
In order to obtain deeper understanding of how these genes participated into larger gene regulatory networks, we applied two bioinformatics approaches: (1) correlations among transcript levels and (2) gene set enrichment analysis.

Correlation mapping
Gene interaction networks are inferred from correlations among gene expression (mRNA) data sets. We added prefrontal cortex gene expression data derived from 42 different inbred mouse strains of the BXD recombinant inbred panel (derived from progenitor B6 and D2 strains; Wang et al., 2003). Genes were clustered into expression networks based on the correlated variation among strains. Expression "neighbors" may represent genes that regulate one or another's expression or are controlled by a common regulator. Such clustering is possible because there is sufficient variation in gene expression among the tested strains. The human microarray gene expression methods are fully described in Shao and Vawter (2008). Briefly the gene expression values from DLPFC for the Stanley Microarray Cohort were obtained using a Codelink platform in the UCI Functional Genomics laboratory. The DLPFC was Trizol extracted for RNA at the Stanley Institute. At UCI, 105 subjects were received for analysis. High-quality arrays were obtained on 27 bipolar subjects, 30 schizophrenia subjects, and 29 controls. The raw expression values were background adjusted, regressed for pH and age, and used for downstream correlation analysis and over-representation analysis.
The full microarray data set is available from the Stanley Medical Research Institute upon request (www.stanleyresearch.org).

Gene set enrichment analysis (GSEA)
We adapted gene set enrichment analysis, originally developed for gene expression analysis (Subramanian et al., 2005), to discover candidate genes sets or pathways that likely contribute to schizophrenia. GSEA determines whether a group of genes is over-enriched with SNPs associated with a disease trait compared to the entire genome. It first ranks all genes in the genome according to the association with the quantitative trait or disease (in this case, the P-value of the SNP's effect on the QT in the 24 SCZ data (Potkin et al., 2009b) or the interaction of diagnosis and SNP on the QT in the SIRP imaging genetics analysis (Potkin et al., 2009c)); it then tests whether a query gene set is enriched with low rank genes (most significant P-values) using a Mann-Whitney U or a Kolmogorov-Smirnov test (KS test). Gene sets are defined based in prior biological knowledge (e.g., canonical pathways, chemical and genetic perturbations) primarily from the MSIG data set, plus microRNA targets, and transcriptional factors targets curated by us (referred to as C3 motif gene sets; Xie et al., 2005) and several clusters that were generated by the correlational analysis of the BxD data set and from the Novartis gene expression atlas (Su et al., 2004). Altogether we tested 9709 gene sets.

Animal models
Animal models can be used to both explore and validate the computational biological approaches. For example, glypican-1 (GPC1) was one of the genes identified in the gene regulatory network based on the GWAS imaging genetics analyses and the BxD data (see Results section). A useful glypican mouse model exists (Aikawa et al., 2008;Ivins et al., 1997;Lander et al., 1996;Litwack et al., 1994). GPC1 encodes a cell surface heparan sulfate proteoglycan (HSPG), a molecule that can act as a co-receptor for growth factors and other signaling molecules, including FGFs, neuregulins, Wnts, BMPs, slits, and netrins (Lander et al., 1996;Selleck, 2006;Song and Filmus, 2002). In this analysis, we assessed the effects of knocking out the GPC1 gene on brain development and the epistatic interactions with FGF17. For full methods, see Jen et al. (2009).

fMRI
In our data, schizophrenics show more (BOLD) activation in the DLPFC than do healthy controls when matched for accuracy performance on the Sternberg Item Recognition Paradigm (SIRP), a working memory task, consistent with cortical inefficiency (Potkin et al., 2009a). The BOLD activation was used as the quantitative phenotype in the GWAS analyses (Fig. 2).
Two genes, RSRC1 and ARHGAP18, were identified that had not been previously associated with cognition or schizophrenia (Potkin et al., 2009b). These two genes, based on available annotation software (Ingenuity Pathways Analysis, SWISSPROT and dbSNP), have functions related to prenatal brain development and cell migration to forebrain structures. Their role in cortical development supports the neurodevelopmental hypothesis of schizophrenia. RSRC1 is a unique marker of progenitor cells that are found in the subventricular zone (SVZ) in the developing and postnatal forebrain. These SVZ progenitor cells give rise to EGFr-responsive progenitors which in the presence of TGF-alpha bind to ERB B EGF receptors (Fallon et al., 2000;Rakic and Zecevic, 2003). ARHGAP18 is part of the family of RhoGAP proteins that participate in cell proliferation, migration, intercellular adhesion, cytokinesis, proliferation, differentiation, and apoptosis (Symons, 1996). ARHGAP18 gene products have been linked to RAS and EGFr-mediated proliferation of cells in general (Wells, 1999). Interestingly, both genes have function in prenatal brain development including neural stem cell proliferation in the SVZ and migration to forebrain structures including limbic, striatal, and amygdaloid circuitry.
In a second study using similar methodology with DLPFC activation as a quantitative trait in schizophrenia subjects and matched controls, six additional genes (or chromosomal regions) related to forebrain development and stress response, and affecting prefrontal efficiency, were also identified (ROBO1-ROBO2, TNIK, CTXN3-SLC12A2, POU3F2, TRAF, and GPC1) (Potkin et al., 2009c). Several of these genes are involved in cortical development, especially in the forebrain in midline connections. GPC1 (glypican, slit receptor) and ROBO1-ROBO2 are involved in dorsal forebrain development, specifically neural precursor migration and axonal connectivity (e.g. midline crossing and guidance of neuron axons to prefrontal cortices including DLPFC). TNIK is highly expressed in the brain (Nonaka et al., 2008) and TNIK mRNA was shown to be upregulated in the dorsolateral prefrontal cortex (DLPFC) of schizophrenia patients (Glatt et al., 2005). A SNP in TNIK was in the top 12 hits associated with schizophrenia in the African-American sample case-control analysis from the Molecular Genetics of Schizophrenia (MGS) consortium . SLC12A2 is involved in regulation of GABA neurotransmission and is differentially expressed in schizophrenia (Dean et al., 2007). CTXN3 (cortexin) is highly enriched in the cortex and increases postnatally. CTXN3-SLC12A2 was found linked to schizophrenia in Lewis et al. (2003) meta-analysis and lies within the chromosome 5 region implicated in cognitive dysfunction found in schizophrenia (Almasy et al., 2008). These findings are consistent with the previously described abnormal callosal morphometry and cortico-subcortical connectivity (Barch et al., 2001;Brett et al., 2002;D'Esposito et al., 1998;Johnson et al., 2006;McNab and Klingberg, 2008;Tura et al., 2008). The discovery of novel associations between genes and risk for neuropsychiatric disorder offers a powerful impetus to postulate new biological mechanisms as well as support previous neurodevelopmental hypotheses.

Identification of gene networks
Correlation mapping Fig. 3 shows the network inferred through this approach. Variation of gene expression in this data set can be viewed as a dynamic response to the perturbation of gene expression in a subset of genes in the network, induced by genetic variations in the coding or regulatory sequences of these genes. Thus, links in this network are likely to reflect functional interactions. Fig. 3 shows a large number of connections between schizophrenia candidate genes as well as some unexpected connections: DACT3, with 3 direct and 8 indirect  (Potkin et al., 2009a). The same accuracy performance is associated with significantly greater DLPFC activation in schizophrenia subjects than in healthy (one-stop) connections to our candidate genes, encodes the orthologue of an amphibian regulator of Wnt signaling, a fact that may be significant given literature linking Wnt signaling to schizophrenia (Cotter et al., 1998;Miyaoka et al., 1999;Proitsi et al., 2008). NDFIP (Nedd4 family interacting protein 1) has 4 direct and 8 indirect connections. It is a Golgi protein that is ubiquitinated by the Nedd4 family of proteins (Harvey et al., 2002); the product of another gene on the list, PMEPA1, also interacts with Nedd4. DNER (delta/ notch-like EGF repeat containing) with 4 direct and 8 indirect connections regulates differentiation of glia through Notch signaling (Eiraku et al., 2005); loss of function mouse Dner gives rise to impaired cerebellar function. MCPH1 (microcephalin 1) has 2 direct and 3 indirect connections; it is expressed in fetal brain, in migrating neurons of the developing forebrain, and on the ependymal and subventricular walls of the lateral ventricles. It is related to brain size in human (mutations in it cause a form of primary microcephaly) and is positively selected for (Evans et al., 2005). GPC1 has a direct connection with Wnt and encodes a cell surface heparan sulfate proteoglycan, which acts as a co-receptor for growth factors and other signaling molecules (Lander et al., 1996;Selleck, 2006;Song and Filmus, 2002). TNIK and TRAF3 have been shown to interact with DISC1 in yeast-two hybrid experiments (Camargo et al., 2007) but did not arise in this mouse data set. DISC1 (disrupted in schizophrenia) has been strongly implicated in schizophrenia and plays a role in brain development. Therefore, we conducted human gene expression data.

Human post-mortem expression data
Gene expression data was obtained by microarray in the DLPFC of 30 patients with schizophrenia, 27 with bipolar disorder, and 29 Fig. 3. Gene interaction network inferred from prefrontal cortex gene expression in 42 different inbred mouse strains. Schizophrenia candidate genes from our GWAS and human healthy controls. An interesting pattern of gene co-expression is observed in the table and differs by diagnosis. The gene expression values within subjects for TNIK were negatively correlated with DISC1 (r = −0.25) in schizophrenia but weakly positively correlated in bipolar disorder or healthy controls. Further support of the interaction between TNIK and DISC1 is provided by a direct binding between TNIK and DISC1 which regulates AMPA receptor activity (Wang, 2010). In rat primary hippocampal neurons, knockdown of DISC1 leads to increase in the TNIK protein level, suggesting that DISC1 negatively regulates the expression of TNIK (Wang, 2010) which is consistent with the negative gene expression correlation between DISC1 and TNIK. GPC1 was significantly correlated with FGF2 (r = 0.42) and showed a trend with FGF17 (r = 0.31) in schizophrenia but to a lesser degree and in the opposite direction in bipolar disorder at r = −0.26 and −.013, respectively, and −0.24 and 0.08 in controls. The correlations for FGF2 with GPC1 in schizophrenic patients and controls are in opposite directions.

Gene set enrichment analysis
GSEA was applied to the previously described imaging genetics GWAS data sets based on differences between SZ and controls (Potkin et al., 2009c) and to an independent, smaller data set of schizophrenic subjects only (Potkin et al., 2009b). Table 2 below shows the top 25 gene sets with Mann-Whitney test Z-score N 4.0 (P b 3 × 10 −5 ) in both data sets, at which threshold the random permuted gene sets returned no hits. There is significant overlap between the GSEA results from the two data sets, although individual genes identified from each data set are different, supporting the merit of using systems and a network-based approach for testing the disease association. A number of interesting gene sets emerge from this analysis. For instance, several miRNA target gene sets (.mir 448, 218, 137) are highly enriched in both data sets, suggesting a potential role of miRNA perturbation in schizophrenia and meriting further investigation. There are nine total miRNA gene sets in the table that were over-enriched in the two imaging genetics GWAS data sets.
The traditional gene ontology categories were significant only once in the table (Biological Process, Nervous System Development), emphasizing the value of using GWAS with GSEA together with miRNA and gene expression data sets (Table 2).

Glypican-1 and FGF17 combined mutant mouse models
Homozygous mutant GPC1 animals are anatomically grossly normal, but possess brains that are ∼15% smaller (and containing 18% fewer cells) than wild type, and display subtle cerebellar mispatterning (Jen et al., 2009). GPC1 heterozygotes have intermediate brain size. The GPC1-related decrease in brain size is significantly affected by FGF17 status; strongest effects are observed with homozygote mutant FGF17 animals while heterozygotes have an intermediate effect, in total indicating an epistatic interaction (P b 0.005, t-test; see Fig. 4). The data show that he presence of either one or two copies of mutant alleles for either GPC1 or FGF17 progressively reduces brain size (P b 0.005, t-test). When animals are null for FGF17, the presence of mutant GPC1 alleles has no significant effect. Jen et al. (2009) assessed signaling pathways by Q-RT-PCR at embryonic day 5, the time in which brain size reduction in GPC1 mutant mice emerges. Levels of transcripts for markers of FGF signaling (Sprouty 1 and Sprouty 2) are reduced whereas markers of Hedgehog, Wnt, and BMP signaling are not. Additional support for the conclusion that GPC1 regulates FGF signaling is the lower levels of endogenous MAP kinase (Erk) activity found in homozygous GPC1 mutants.

Discussion
Our approach to identifying gene regulatory networks that contribute to a risk of schizophrenia begins with the identification of new candidate risk genes using brain imaging as a QT in the context of a GWAS. We then applied computational biology methods to these GWAS candidates and to expression data sets in both humans and animals to more fully understand these candidates. This builds on the previous imaging genetics results to integrate across levels of inquiry in understanding genetic influences on system-level phenotypes, as denoted in Fig. 1. The GSEA method identified multiple miRNAs in both imaging genetics GWAS data sets. As miRNAs affect the expression of many genes, this provides support for the idea of widespread genetic networks underlying schizophrenia. Finally, it highlights the potential of the GPC1/FGF mouse as an animal model of some characteristics of schizophrenia, in part based on the human GWAS data. Initially, we asked whether commercial software (Ingenuity Pathways Analysis v7, Santa Clara CA) developed to search certain published networks (e.g. protein-protein interaction data (Stelzl et al., 2005) for functional connections among sets of genes) might reveal any new relationships among the genes identified by imaging genetics GWAS (ARHGAP18, RSRC1, GPC1, ROBO2, ROBO1, CTXN3, SLC12A2, TRAF3, TNIK, POU3F2; Potkin et al., 2009b, c). The ingenuity annotation output called attention to a few connections-for example, that the ligands for the ROBO1 and ROBO2 receptors, the SLITS (Killeen and Sybingco, 2008;Lopez-Bendito et al., 2007;Nguyen-Ba-Charvet and Chedotal, 2002), bind GPC1 (Ronca et al., 2001)-that are well established in the literature. The full implication of these results, however, required more innovative computational and physiological approaches. The correlational connections shown in Fig. 3 and Tables 1 and 2 were not available in any currently existing software, and the miRNA findings were not identified. Our expression data provided support for an interaction between GPC1 and FGF17 and FGF2 as well as between TNIK and DISC1. A recent set of studies found the kinase domain of TNIK binds to a small region on DISC1, a key gene consistently linked to schizophrenia risk (Wang, 2010). The potential importance of TNIK itself in schizophrenia has been highlighted by several independent studies, supporting its role as an emerging risk factor (Glatt et al., 2005;Shi et al., 2009). DISC1 has been shown to modulate TNIK kinase activity and together they function at Fig. 4. Epistasis: GPC1 phenotypes require FGF17. The effects of compound GPC1/FGF17 genotypes demonstrate an epistatic effect on brain weight. Bars indicate fresh brain weights of the compound mutants. The genotypes are indicated by + for wild type, − for knockout, −/+ for heterozygote). Corresponding Nissl-stained mid-sagittal sections the cerebella morphometry is shown. The red arrowheads mark the anterior-most lobe (lobe 1) and the fusion of lobes III and IV, a phenotype observed in FGF17−/− mice that disappears in GPC1−/− and FGF17−/− animals. GPC1 appears to be acting in a pathway upstream of FGF17 (Jen et al., 2009). the synapse to regulate synaptic composition and GLUR1 and AMPA activity (Wang 2010), both hypothetically related to schizophrenia (Harrison and Weinberger, 2005).
Whether the GPC1 brain size phenotype noted in the mouse model has any relationship to a role for GPC1 in schizophrenia is unknown. It is interesting, however, that clinical studies support a small but significant decrease in brain size in schizophrenia (on the order of 3%; Steen et al., 2006;Ward et al., 1996). In addition, although FGF17deficient mice are behaviorally relatively normal, they show striking deficits in social recognition and affiliative interactions (Scearce-Levie et al., 2008), which is particularly intriguing given the social dysfunction characteristic of schizophrenia patients Scearce-Levie et al., 2008). Thus, even though FGF17 has never itself emerged by GWAS as an SZ candidate gene, its functional association with GPC1, and its mouse behavioral phenotype, strongly point to it being a potentially important component of a schizophrenia-related gene network. In support of this idea, we note that recent GWAS studies have identified FGFR2, a major FGF brain receptor, as strongly associated with schizophrenia (O'Donovan et al., 2009;Potkin et al., 2009c).
This work suggests that FGF17 and/or GPC1 mutant mice may develop into useful and valid models for the negative symptoms of schizophrenia (which are especially difficult to treat) and additionally provide a proof-of-concept model for therapeutic intervention. It is also of interest that the brain weights of heterozygous GPC1 mutants fall midway between wild-type and homozygous mutants, as it suggests a strong quantitative dependency of FGF signaling on GPC1 expression levels. We have observed alterations in FGF2 gene expression in the DLPFC of SZ patients (Shao and Vawter, 2008). Because most non-coding genetic polymorphisms probably exert their effects by altering levels of gene expression, such a strong dependency may explain why GPC1 is more prone to detection by GWAS than other genes in the networks in which it acts.
The miRNA findings based on the imaging genetics GWAS data are intriguing, as they suggest a mechanism for regulation of a network of genes. Our third top miRNA, 137, was one of the five major findings from a 52,156 subject GWAS study reported by P. Gejman (Gejman, 2010). This is in accord with the underlying multiple genes models of schizophrenia in general, and specifically the potential importance of gene regulatory networks as contrasted with single-gene effects for complex illnesses. Understanding these networks in animal models offers potentially new opportunities for therapeutic modulation.
In summary, the integration of imaging genetics GWAS analyses in schizophrenia with computational biology methods and animal models identified several key findings that extended our initial imaging GWAS studies in schizophrenia to improved understanding of the system-level phenotype. Finally, the GPC1/FGF mouse based in Table 2 Top 25 gene sets enriched with most significant P-value genes in two independent imaging genetics GWAS data sets; SCZ data set n = 24 (Potkin et al., 2009b) and SIRP data set n = 138 (Potkin et al., 2009c). Gene set codes: c2 refers to curated gene sets from canonical pathways and chemical and genetic perturbations. C3 refers to motif gene sets in MSigDB; c4, to computational gene sets, and c5 to GO gene sets. MicroRNA data sets are indicated by .mir. For more information see MSigDB (www.broad.mit.edu/gsea/msigdb/index.jsp). part on our imaging genetics human GWAS data and computational gene expression analyses may provide a useful animal model of key characteristics of schizophrenia.