Models of spliceosomal intron proliferation in the face of widespread ectopic expression

It is now certain that today living organisms can acquire new spliceosomal introns in their genes. The proposed sources of spliceosomal introns are exons, transposons, and other introns, including spliceosomal and group II self-splicing introns. Spliceosomal introns are thought to be the most likely source, because the inserted sequence would immediately be endowed with the essential set of intron recognition sequences, thereby preventing the deleterious effects associated with incorrect splicing. The most obvious spliceosomal intron duplication pathways involve an RNA transcript intermediate step. Therefore, for a spliceosomal intron to be originated by duplication, either the source gene from which the novel intron is derived, or that gene and the recipient gene, which contains the novel intron, would need to be expressed in the germ line. Intron proliferation surveys indicate that putative intron duplicate-containing genes do not always match detectable expression in the germ line, which casts doubt on the generality of the duplication model. However, judging mechanisms of intron gain (or loss) from present-day gene expression profiles could be erroneous, if expression patterns were different at the time the introns arose. In fact, this may likely be so in most cases. Ectopic expression, i.e., the expression of genes at times and locations where the target gene is not known to have a function, is a much more common phenomenon than previously realized. We conclude with a speculation on a possible interplay between spliceosomal introns and ectopic expression at the origin of multicellularity.


'Introns early' and the late proliferation of spliceosomal introns
The debate on the origins and evolution of spliceosomal introns calls for two distinctions. The first distinction revolves around the two uses of the term 'intron'; specifically, introns as a theoretical construct, which should be distinguished from introns such as they become eventually instantiated into the specific types of intervening sequences which have hitherto been discovered (e.g., spliceosomal introns, group I and II introns, tRNA introns). As a theoretical construct-i.e., the notion of some sort of intragene non-coding sequence which is spliced out from the RNA before translation into amino acid sequence-introns are an invocation of the 'introns early' theory. The theory pushed the origin of the presently observed 'genesin-pieces' structure of eukaryotic genes back to the RNA world (Doolittle, 1978). This claim was justified on several assumptions. One is that the first self-replicating coding nucleic acid sequences-the so-called 'minigenes' and/or the larger molecules that would have resulted from their accretionwould have necessarily included some stretches with and some without coding information (Darnell, 1981;Doolittle, 1981;Darnell and Doolittle, 1986). A second assumption is that RNA-RNA processing already existed in the earliest RNA world (Darnell, 1981;Darnell and Doolittle, 1986). This notion has recently been strengthened after the discovery that mRNA splicing by the spliceosome is, in fact, an RNA catalyzed reaction (Valadkhan, 2005), which confirms the previous conjecture based on the apparent similarities between the splicing mechanisms of group II self-splicing introns and spliceosomal introns (Cavalier-Smith, 1991;Sharp, 2005). A third assumption, borrowed from the 'exon theory' of genes (Blake, 1978;Gilbert, 1978Gilbert, , 1987Tonegawa et al., 1978), is that, once they originated, introns became co-opted into spacers, which increased the chances of illegitimate recombination between existing coding units. The earliest introns would have been retained because they facilitated the generation of novel proteins from pre-existing functional modules (in a similar way as exon shuffling fostered the assemblage of mosaic proteins at the origin of the metazoans; Doolittle, 1978;Patthy, 1999;Liu and Grigoriev, 2004;Cohen-Gihon et al., 2005). One corollary of the 'introns early' theory is that today's intron-lacking-or intron-sparse-genomes have resulted from extensive intron loss (which implies that the eukaryotic mode of gene organization antedates the origin of prokaryotes; Darnell and Doolittle, 1986).
The 'introns early' theory was inspired by the discovery of the 'split-gene' structure (Berget et al., 1977;Chow et al., 1977). Efforts to falsify the theory focused, accordingly, on the empirical evidence gathered from spliceosomal introns. These efforts, which gave body to the 'introns late' theory (Cavalier-Smith, 1978;Palmer and Logsdon, 1991;Logsdon, 1998), failed to demonstrate a correspondence between the position of introns and the boundaries of coding modules-identified following a variety of alternative criteria-for ancient proteins (i.e., defined as those whose origin preceded the eukaryoteprokaryote split; Logsdon, 1998). Moreover, neither spliceosomal introns, nor traces of the splicing machinery, have been found in the more than one hundred bacterial and archaeal genomes that have been completely sequenced (Lynch and Richardson, 2002).
The controversy between "introns early" and "introns late" has not been conclusively settled, to a large extent because of the uncertainty associated with the reconstruction of the historical pathway of extremely old events (Collins and Penny, 2005). For example, a study of mosaic genes assembled by exon shuffling from symmetrical modules of class 1-1 (i.e., modules flanked by introns placed between the first and second codon letters) at the origin of the metazoans-a relatively recent event compared with the origin of genes-has found that much of the inferred original intron-module structure has disappeared in flies and worms (Bányai and Patthy, 2004). Moreover, there is not conclusive demonstration of why natural selection for a streamlined genome, such as that typically exhibited by prokaryotes, would not result into wholesale intron elimination (Mourier and Jeffares, 2003;Charlesworth and Barton, 2004). Extensive intron losses have accompanied genome compaction in arthropods and nematodes, and apparently in many extant lineages of unicellular eukaryotes, as shown by phylogenetic (Rogozin et al., 2003;Roy and Gilbert, 2005a) and phylogenyindependent (Archibald et al., 2002;Simpson et al., 2002;Anantharaman et al., 2002;Bányai and Patthy, 2004;Collins and Penny, 2005) criteria.
Theories of nuclear genome size variation postulate specific evolutionary forces for wholesale intron elimination. The 'nearoptimal DNA' theory (Cavalier-Smith, 1978) sets off the structural role of non-genic DNA in providing the nuclear skeleton ("nucleoskeletal" DNA), such that genome sizes (plus their tightness of packing and degree of unfolding) causally determine nuclear volumes. The ratio of the volume of the nucleus to that of the cytoplasm is invariant with cell volume owing to metabolic and steric constraints. Cell volume is a highly adaptive feature under cell-cycle genes control. Cell size decreases caused by mutations in cell-cycle genes will, accordingly, effect positive selection for a corresponding decrease in nuclear volume, which would be optimally achieved by decreasing the amount of nuclear DNA, including introns (Cavalier-Smith, 2005).
Plausible molecular mechanisms that could yield extensive intron loss have been identified (Mourier and Jeffares, 2003). If the ancestors at the divergences between major eukaryotic kingdoms (including protists and multicellular eukaryotes) already exhibited an intron-dense genome structure (Roy and Gilbert, 2005a), spliceosomal introns would have arisen much earlier. This conclusion is consistent with recent comparative analyses of many basal eukaryotic lineages showing that the last common ancestor of extant eukaryotes already contained a spliceosome that was similar in complexity to the spliceosome present in today's eukaryotes (Anantharaman et al., 2002;Collins and Penny, 2005).
Discovery of spliceosomal introns of late origin (Giroux et al., 1994;Logsdon et al., 1995;Hankeln et al., 1997;Frugoli et al., 1998;Tarrío et al., 2003;Coghlan and Wolfe, 2004) does not disprove the 'introns early' theory, but only indicates that new spliceosomal introns continue to be born in evolution. Moreover, inferred contemporary rates of intron birth are too slow to have been able to accrue present-day intron numbers (Roy and Gilbert, 2005b). Likewise, corroboration of the widespread notion that spliceosomal introns-together with the ribonucleic acid components of the spliceosomal apparatus-are descendants of the pieces of an hypothetical fragmentation underwent by type II self-splicing introns after they invaded the nucleus from the mitochondria (Sharp, 1991), would only imply that spliceosomal introns are not the right objects to falsify Doolittle's (1978) theoretical constructs.
It could conversely be entertained that the current organization of the ribonucleotide fraction of the spliceosome into pieces, including the five small nuclear RNAs in addition to the spliceosomal intron itself, reflect the ancestral state from which contemporary one-piece self-splicing introns II derived. Recent findings in the hyperthermophile parasite Nanoarchaeum equitans strongly suggest that this might have been the case for modern tRNA introns (Randau et al., 2005). This organism builds its functional tRNA genes from pieces. Each piece consists of two halves including a tRNA module and a terminal sequence. The two pieces join via reversecomplementation of their terminal sequences, which gives place to intervening duplex sequences. Taking into account the basal position of Nanoarchaeum within the archaea, it seems plausible that these intervening duplexes represent the ancestors from which modern one-piece tRNA introns evolved (Randau et al., 2005). Moreover, fusion events are estimated to be at least four times more common than fission events in the evolution of multi-domain protein genes (Kummerfeld and Teichmann, 2004).
The arguments we have advanced suggest that the current mechanisms by which spliceosomal introns proliferate may not be the same as the mechanisms that first caused spliceosomal introns to become integral parts of genes (Roy and Gilbert, 2005b). In this review, we focus on the current proliferation of introns. In particular, we explore some potentially important implications that the discovery that ectopic expression-i.e., the expression of genes at times and locations where the target gene is not known to have a function-is a widespread phenomenon (Khaitovich et al., 2004;Yanai et al., 2004;Rodríguez-Trelles, 2004;Rodríguez-Trelles et al., 2005) has for evaluating mechanisms of intron proliferation.

Intricacies of spliceosomal intron proliferation
It is now certain that living organisms can gain new spliceosomal introns in their genes. However, little is known about the frequencies and rates at which this happens. Do new introns proliferate steadily or episodically? Which, if any, are the correlations between intron gain and intron loss? Intron proliferation (like many other features of genomes, such as GC content, deletion/insertion rate ratios, and others) is a nonhomogeneous non-stationary process, which varies both between lineages and within a lineage's evolution. Comparative genomics studies have evinced large differences in intron numbers that do not reflect the phylogeny of the species (e.g., Bhattacharya et al., 2000;Wada et al., 2002;Rogozin et al., 2003;Castillo-Davis et al., 2004;Cho et al., 2004;Edvardsen et al., 2004;Nielsen et al., 2004;Roy and Gilbert, 2005b). The causes underlying this variation are poorly understood, but doubtlessly involve a complex intertwining between natural selection and internal, bias-at-the-origin factors, none of which has been satisfactorily accounted for.
Natural selection influences intron proliferation if only because the greater the number of introns the most likely it will be that mutations will occur that impact essential intron recognition sequences and yield non-functional alleles (Lynch, 2002;Sharp, 2005). Also, transcription of long introns represents a metabolic cost to the cell (Castillo-Davis et al., 2002). According to the nearoptimal DNA theory (Cavalier-Smith, 1978; see above), intronic DNA is selected against when nucleoskeletal DNA-and thus the volume of the nucleus-decreases in response to reductions in cell size (Cavalier-Smith, 2005). Yet, a growing body of data indicates that introns might also be favored by natural selection. Introns are no longer regarded as 'junk' DNA (because of their condition as rapidly evolving untranslated sequences at a time dominated by the 'central dogma'); rather, they are increasingly considered as epitome of functionality (Mattick, 1994(Mattick, , 2003Mattick and Gagen, 2001;Le Hir et al., 2003;Lynch and Kewalramani, 2003;Rodríguez-Trelles et al., 2003;Cavalier-Smith, 2005;Bompfünewerer et al., 2005).
This change of perception is favored by three recent considerations: (i) proteins are not the end product of a serial processing, but a branch in a parallel information network where untranslated sequences might turn out to be central players (Mattick, 2003); (ii) in addition to its genic role, DNA performs regulatory and structural functions (e.g., nucleoskeletal role; Cavalier-Smith, 2005); and (iii) unlike proteinencoded messages, non-coding information, such as it is effected in the folding and interactions of nucleic acids, often does not impose a high degree of primary sequence definition (Bergman and Kreitman, 2001;Rodríguez-Trelles et al., 2003).
There is an increasing sense that introns are highly plastic, dynamically co-optable entities, whose role can change repeatedly during the course of their existence, in addition to frequently being multi-functional. Besides the constraints imposed by their own splicing, and their role in exon shuffling (Gilbert, 1978;Patthy, 1999), introns are intimately associated with the regulation of gene expression through the increasing amount of couplings between splicing and transcription/ translation recently uncovered (Kornblihtt et al., 2004;Maquat, 2004); carry cis-regulatory information (Le Hir et al., 2003;Rodríguez-Trelles et al., 2003;Pagani and Baralle, 2004); are suspected to be integral components of a trans-acting regulatory network that would act in parallel with proteins in the development of complex organisms (Mattick, 1994(Mattick, , 2003Mattick and Gagen, 2001); and play a role in RNA editing (Herbert, 1996). In addition, introns promote transcriptome variation by means of alternative splicing (Boue et al., 2003;Kondrashov and Koonin, 2003;Modrek and Lee, 2003;Ast, 2004); provide coordinates for the identification of premature termination codons in nonsense-mediated decay (Lynch and Kewalramani, 2003); may be an important factor for the spatial distribution of nucleosomes (Csordas, 1989;Lauderdale and Stein, 1992;Baldi et al., 1996;Denisov et al., 1997;Levitsky et al., 2001;Vinogradov, 2005); and likely are significant components of the nucleoskeletal DNA (Cavalier-Smith, 1978. Some of these functions rest upon intron sequence features, whereas some others, like their role in non-sense mediated decay, appear to depend only on the positional information that is left at the exon-exon junctions after the introns are spliced out from the primary transcript. These and possibly other as yet undiscovered functions may have contributed to intron proliferation at different times and with varying intensities for different introns and lineages. Thus, cell size increases may create propitious conditions for intron proliferation because of the concomitant necessity of increasing nucleoskeletal DNA (Cavalier-Smith, 2005). However, as noted above, the number of introns per gene is expected to have an upper threshold above which additional insertions are disfavored by natural selection (Lynch, 2002;Lynch and Kewalramani, 2003).
Natural selection operates on the variation originated by mutation. Albeit mutations are random with respect to their effects on fitness, they are non-random in the sense that not all types of mutations are mechanistically equally probable (e.g., Rodríguez-Trelles et al., 1999, 2000a. Mutation biases can thus imprint orientation on intron proliferation. Intron gain is predominantly viewed as a complex mutational process which involves features of the recipient sequence-the so-called protosplice site, and possibly various other recognition sequences, such as those that modulate exon splicing (Dibb and Newman, 1989;Sadusky et al., 2004)-as well as intron sources. The spatial distribution of proto-splice sites is strongly correlated with codon-usage biases, which could account in part for the excess of phase 0 introns (i.e., introns between codons) over phases 1 and 2 introns (Ruvinsky et al., 2005). Introns are rarely found very near one another within genes, which may be due to spatial constraints associated with a need of room for splicing and/or other restraints. The probability of an intron insertion at any particular target site will, therefore, be impacted by its distance to the nearest intron, which may change over evolutionary time. It remains unknown the extent to which such mechanical constraints, in conjunction with variable combinations of the selective forces discussed in the previous paragraph, can result in convergent intron distributions across independent lineages (Fedorov et al., 2002;Rogozin et al., 2003;Tarrío et al., 2003;Sverdlov et al., 2005).
Spliceosomal introns can arise from exons (Rogers, 1989), from transposons (Crick, 1979), or other introns including spliceosomal (Sharp, 1985) and group II self-splicing introns (Rogers, 1989). Each different intron source might result into a distinctive mode of intron proliferation. Because of their invasive properties, transposons may be favored as natural agents for bursts of intron spread, and this may account for the observed disparate phylogenetic patterns of intron density (Purugganan, 1993;Rzhetsky and Ayala, 1999;Fedorov et al., 2003). But the heterogeneity of phylogenetic patterns could be associated with other sources of introns, such as spliceosomal introns if the rates of duplication depend on the effectiveness of the biochemical processes involved (e.g., mutations altering the kinetics of reverse-splicing and/or reverse-transcription-see below). Ascertaining which, if any, of the aforementioned intron sources is most important, or if, alternatively, their relative significance varies during the course of evolution, will have a profound impact on our current understanding of the evolution of gene structure.

Germ line gene expression in models of spliceosomal intron duplication
Spliceosomal introns themselves have long been favored as the most likely source of new spliceosomal introns (Sharp, 1985;Hankeln et al., 1997;Logsdon, 1998;Tarrío et al., 1998;Coghlan and Wolfe, 2004). This "intron duplication" or "introntransposition" model is appealing because it ensures that the inserted sequence is immediately endowed with the essential recognition sequences, which would prevent deleterious effects due to incorrect splicing (Sharp, 1985;Palmer and Logsdon, 1991;Lynch and Richardson, 2002). Three different mechanisms by which spliceosomal introns could beget new introns have been proposed ( Fig. 1): (i) reverse-splicing of spliced introns into a new site in the same or a different mRNA, which is subsequently reverse-transcribed to a cDNA that recombines with the genome (Sharp, 1985); (ii) reverse-transcription of spliced introns which reinsert themselves into the nuclear genome (Lynch, 2002); and (iii) conversion of intron-less genes by reverse-transcription of unspliced transcripts of their introncontaining paralogs (Hankeln et al., 1997).
Of these three mechanisms, only the first two can originate novel intron positions; the third mechanism can result only in a new intron at the same position as the source intron (Coghlan and Wolfe, 2004). Moreover, the first pathway entails the extra benefit of providing a guidance mechanism by which introns Fig. 1. Mechanisms of spliceosomal intron duplication. (1) Reverse-splicing (RS) of spliced introns into a new site in the same (mRNA 1 ) or a different (mRNA 2 ) germ line-expressed (GLE) mRNA, which is subsequently reverse-transcribed (RS) to a cDNA that recombines with the genome (Sharp, 1985); (2) reverse-transcription (RT) of spliced introns which reinsert themselves into the nuclear genome (Lynch, 2002); and (3) conversion of intron-less genes by reverse-transcription of unspliced transcripts of their intron-containing paralogs (Hankeln et al., 1997).
can be preferentially inserted into regions containing exon splicing enhancers (Lynch and Richardson, 2002). In order to be vertically transmitted, an intron gain must occur in a germ line cell Coghlan and Wolfe, 2004;Roy, 2004). All three intron duplication pathways involve necessarily an RNA transcript intermediary. Therefore, for a spliceosomal intron to be originated by duplication, either the source gene from which the intron is derived, or-if the new intron originates via a reverse-splicing mechanism-that gene and the recipient gene containing the novel intron need to be expressed in the germ line (Logsdon, 1998;Logsdon et al., 1998;Coghlan and Wolfe, 2004;Roy, 2004).
Detection of intron duplication events usually starts with identification of newly gained introns on the basis of their restricted phylogenetic distribution. The sequence of each new intron is then matched to the remaining available intron sequences of the same genome in search for similarity. Only three studies have obtained positive results (Hankeln et al., 1997;Tarrío et al., 1998;Coghlan and Wolfe, 2004). Only the most recent study has examined expression data (Coghlan and Wolfe, 2004). The study identified 122 recent gains, defined as introns present in either Caenorhabditis elegans or C. briggsae that are absent from two independent distantly related outgroups (the parasitic nematode Brugia malayi and the arthropodvertebrate clade consisting of fly, mosquito, human, and mouse). This definition of recent intron gain hinges on the assumption that one intron gain is more likely than three independent intron losses. This may be too liberal an assumption, particularly if we take into account that rates of intron loss appear to be much greater, perhaps one order of magnitude greater, than rates of intron gain (Kryzwinski and Besansky, 2002;Lynch, 2002;Wada et al., 2002;Cho et al., 2004;Kiontke et al., 2004;Roy and Gilbert, 2005b). Putative novel introns are on average longer than control introns (Coghlan and Wolfe, 2004); this, however, may be because nematodes preferentially lose shorter introns (Cho et al., 2004). Moreover, one intron gain considered certain by Coghlan and Wolfe (2004) may represent an insertion of a palindromic element into a pre-existing intron (Roy, 2004). Be that as it may, out of the 122 presumptive novel introns, 28 exhibit significant sequence identity to other introns of the same genome, which strongly suggests that they arose by duplication of older introns (Coghlan and Wolfe, 2004).
Both, the novel introns and the set of older introns that they match are preferentially located in genes with detectable germ line expression (Coghlan and Wolfe, 2004); although the correlation is not perfect, which casts doubt on the intron duplication model (Logsdon, 1998;Logsdon et al., 1998;Coghlan and Wolfe, 2004;Roy, 2004). Should a perfect correlation be expected? In our view, not necessarily, if the following two conditions obtain. (i) There is a stochastic component to gene expression. The conventional notion of the 'expression level of a gene' as a cell type-or tissue-feature is an artefact that was prompted by the study of gene expression with methods that require large cell populations (Paldi, 2003;Kurakin, 2005). Modern single-cell based approaches indicate that the activity of a gene can vary significantly between cells of the same tissue, simply because of the fact that gene expression intrinsically relies upon random encounters between finite, and often small, numbers of diffusible molecules (Sternberg and Félix, 1997;Paldi, 2003;Kurakin, 2005;Kaern et al., 2005;Theise, 2005). The probabilistic character of gene activation makes it conceivable that RNA-mediated intron duplication could occur via stochastically produced transcripts in cells from germ line tissues where the average steady state of expression of the encoding genes may pass undetected. (ii) There is gene expression turnover, which in the long term is likely to be of greater consequence than (i). Evolutionary transcriptomics studies (Khaitovich et al., 2004;Yanai et al., 2004;reviewed in Rodríguez-Trelles et al., 2005) suggest that present-day gene expression profiles may carry only limited information about the expression profiles of the recent past.

Widespread ectopic expression and the proliferation of Xdh introns
The requirement of germ line gene expression in models of intron duplication emanates from a long-standing regulatory paradigm, which claims that gene expression profiles are controlled down to the last detail (Carroll et al., 2001;Davidson, 2001;Wilkins, 2002). Under this scheme, ectopic expression, i.e., the expression of genes at times and locations where the target gene is not known to have a function, would be mostly deleterious. This paradigm has been challenged by molecular geneticists who have shown that any gene may be transcribed in any cell type (Humphries et al., 1976;Weintraub and Groudine, 1976;Chelly et al., 1989), and evolutionists who have shown that enzymatic-protein expression profiles are greatly variable, even among closely related species (Dickinson, 1980;see Rodríguez-Trelles et al., 2005).
Evolutionary transcriptomics studies have shown that (i) a substantial fraction of gene expression differences between species is adaptively neutral or nearly neutral (Khaitovich et al., 2004), and (ii) that for any given species and tissue, it is frequently not possible to anticipate whether a gene will be transcriptionally active or not on the basis of its expression status in the same tissue in related species (Yanai et al., 2004). These findings indicate that ectopic expression is widespread, an interpretation consistent with (i) the properties of cisregulatory sequences, which are typically short and thus can easily arise-or be dismantled-by mutation randomly throughout the genomes, and (ii) gene cross-talking conflicts arising because unrelated promoters often carry cis-regulatory sequences for the same transcription factor (see Rodríguez-Trelles et al., 2005). Apparently, many genes can change their transcriptional status erratically during the course of evolution without major functional impingement. A corollary of this conclusion is that present-day germ line expression status ('on' or 'off') of newborn-intron-containing genes might be irrelevant for evaluating intron duplication models.
As an example, consider the case of the Xdh (xanthine dehydrogenase) gene. In D. sucinea and D. capricorni, two closely related species of the Drosophila willistoni group, the Xdh gene carries two short introns referred to as introns A and B (Tarrío et al., 1998). Introns A and B are most likely novel introns (40 My old, the approximate age of the sucineacapricorni lineage, or less) because they are absent from all 16 increasingly distantly related lineages of animals and fungi (those listed in Tarrío et al., 2003, plus Apis mellifera and Tribolium castaneum). The two introns exhibit significant sequence similarity to an older intron located nearby within the Xdh gene, which indicates their origin by duplication (Tarrío et al., 1998). Retention of similarity between the old and new introns might have been facilitated because the Xdh region has evolved more slowly in D. sucinea and D. capricorni than in other willistoni species (Rodríguez-Trelles et al., 2000b). The expression status of Xdh in the germ line of these species is unknown. However, Dickinson (1980) detected Xdh activity in the ovaries of D. adiastola and D. ornata, two members of the Hawaiian Drosophila adiastola subgroup, using protein electrophoresis. The fact that he was not able to detect Xdh expression in the remaining 25 Hawaiian species of his surveyincluding additional adiastola subgroup representatives-was taken as an indication that the referred Xdh activities were ectopic. But we want to point out that even if Xdh is currently inactive in the germ lines of D. sucinea and D. capricorni, it could well have been active at the time introns A and B arose. Widespread ectopic expression might thus account for the imperfect matching between germ line expressed genes and genes that carry new introns that still resemble their parental introns (Coghlan and Wolfe, 2004). This is, of course, the case for any model of intron gain-or loss-that requires an RNA transcript intermediary.
Studies seeking to evaluate the mechanisms of intron origins should show that the implicated genes were transcriptionally active in the germ line at the relevant times. There are at least two modes of tackling this issue. One is reconstructing ancestral gene expression states from appropriate phylogenetic sampling. A second way is by circumscribing the analyses to genes which are known to be performing essential functions in the germ line (hence being realistically expected to have remained stably expressed). In cases of ancient intron gain events, it may be all but impossible to ascertain that the target genes were expressed in the germ line, but ancient intron duplications may be difficult to establish because their sequence similarity would have largely decayed.

Spliceosomal introns and ectopic expression in the evolution of multicellularity
The evolution of multicellularity represents a major transition in the history of life, which may have independently occurred several times (Kirk, 2005). Multicellular organisms develop from a single cell that replicates to give rise to a spatially structured individual with a number of differentiated cell types. The starting condition for the evolution of multicellularity is assumed to be a colony of identical cells derived from the clonal expansion of a single cell (Aravind and Subramanian, 1999;Maynard-Smith and Szathmáry, 1999;Kirk, 2005). Subsequently, certain changes would have assorted the expression of ancestral regulatory effectors among distinct subsets of cells in the colony, thus triggering spatial differentiation for the evolution of functional differentiation among distinct cell types (Aravind and Subramanian, 1999;Maynard-Smith and Szathmáry, 1999). In unicellular organisms, differentiation consists of a succession of gene expression states in response to environmental conditions; ectopic expression can only be displayed on a temporal dimension. Changes in the regulatory transcriptional profiles of different cell subsets of early cell aggregates could have represented the earliest evolutionary manifestation of ectopic expression on a spatial scale. Novel expression profiles could become heritable traits by concomitant changes in DNA methylation patterns or chromatin marks. Insofar as ectopic expression is a reflection of architectural constraints of the regulatory system (Rodríguez-Trelles et al., 2005), the origin of multicellularity might be contemplated as a natural outcome of cell aggregation.
The unfolding of ectopic expression along the spatial axis would represent an explosion in the number of cellular environments to which gene products were exposed, increasingly so as the cell types became more and more specialized. Environmental diversification would have further expanded the range of potential interactions, thus opening new avenues for the recruitment of genetic variation. Spliceosomal introns may have been primary players in this scenario, first by allowing the generation of novel combinations of exons by exon shuffling (Patthy, 1999;Cohen-Gihon et al., 2005), and second, because they became readily co-opted for alternative splicing (Boue et al., 2003;Ast, 2004). The origin of multicellularity might thus have left its own imprint in the subsequent proliferation of spliceosomal introns.