You Can’t Unscramble an Egg: Population Genetic Structure of Oncorhynchus mykiss in the California Central Valley Inferred from Combined Microsatellite and Single Nucleotide Polymorphism Data

Steelhead/rainbow trout ( Oncorhynchus mykiss ) are found in all of the major tributaries of the Sacramento and San Joaquin rivers, which flow through California’s Central Valley and enter the ocean through San Francisco Bay and the Golden Gate. This river system is heavily affected by water development, agriculture, and invasive species, and salmon and trout hatchery propagation has been occurring for over 100 years. We collected genotype data for 18 highly variable microsatellite loci and 95 single nucleotide polymorphisms (SNPs) from more than 1,900 fish from Central Valley drainages to analyze genetic diversity, population structure, differentiation between populations above and below dams, and the relationship of Central Valley O. mykiss populations to coastal California steelhead. In addition, we evaluate introgression by both hatchery rainbow trout strains, which have primarily native Central Valley ancestry, and imported coastal steelhead stocks. In contrast to patterns typical of coastal steelhead, Central Valley O. mykiss above and below dams within the same tributary were not found to be each others’ closest relatives, and we found no relationship between genetic and geographic distance among below-barrier populations. While introgression by hatchery rainbow trout strains does not appear to be widespread among above-barrier populations, steelhead in the American River and some neigh-boring tributaries have been introgressed by coastal steelhead. Together, these results demonstrate that the ancestral population genetic structure that existed among Central Valley tributaries has been significantly altered in contemporary populations. Future conservation, restoration, and mitigation efforts should take this into account when working to meet recovery planning goals.


INTRODUCTION
as steelhead (anadromous life history) or rainbow trout (resident life history). Tributary rivers from the west slope of the Sierra Nevada mountain range and east slopes of the coastal mountain ranges feed into the north-flowing San Joaquin and the southflowing Sacramento rivers, which converge in the San Francisco Bay/Delta region before finally exiting to the Pacific ocean. The watershed has been severely affected by the construction of many dams, which block movement by anadromous fish and effectively divide nearly every major tributary into separate above-barrier and below-barrier reaches. In addition, much of the flow in the system is diverted for agricultural and domestic uses through an extensive system of levees and aqueducts. Together, these effects have severely modified and reduced the habitat available to anadromous fishes (Lindley et al. 2006).
Populations of steelhead in California are divided into six Distinct Population Segments (DPSs) for management purposes; five on the coast and one in the Central Valley (Busby et al. 1996). Importantly, these DPSs specifically include only anadromous life-history fish that spawn below impassable barriers to migration; O. mykiss isolated above natural or artificial barriers to fish passage are excluded from the DPS and, consequently, from protection under the U.S. Endangered Species Act (ESA;Federal Register 2006). The California Central Valley Steelhead DPS also includes fish produced by two of the four artificial propagation programs in the Central Valleythe Feather River Fish and Coleman National Fish hatcheries-but not those spawned at the Nimbus or Mokelumne River hatcheries. This DPS was listed as "Threatened" under the ESA in 1998 and this status was reaffirmed in (Federal Register 2006. Hatchery rainbow trout have been heavily stocked in the reservoirs above nearly all of the Central Valley dams for more than 100 years (Busack and Gall 1980;California HSRG 2012). These captive hatchery trout broodstock strains were domesticated from diverse geographic and phylogenetic sources, but many originated from fish collected from streams that drain into the Central Valley (Needham and Behnke 1962;Leitritz 1970). Similarly, steelhead and other anadromous salmonids have been propagated at several hatcheries in the Central Valley since the late 1800s, and four Central Valley hatcheries (Coleman, Feather, Nimbus, and Mokelumne), currently release approximately 1.5 million yearlings annually (Brown 2005;California HSRG 2012). For both steelhead and hatchery trout strains, it has been common practice to move eggs among hatcheries within the Central Valley and to import eggs from outside sources (Leitritz 1970;California HSRG 2012). Nimbus Hatchery on the American River has been a substantial producer of steelhead in the Central Valley since 1955 (Leitritz 1970) and, for many years, imported eggs from coastal steelhead sources, primarily the Eel and Mad rivers (California HSRG 2012). However, the extent to which such interbasin transfers have influenced population structure of O. mykiss in the Central Valley has not been carefully evaluated.
Numerous genetic analyses of salmonid population structure in California have relied on microsatellite markers, because such multi-locus data can identify population genetic structure at both larger scales (Aguilar and Garza 2006;Clemento et al. 2009;Garza et al. 2014) and at relatively fine ones (Deiner et al. 2007;Pearse et al. 2007Pearse et al. , 2009Kinziger et al. 2013), including within the Central Valley (Banks et al. 2000;Nielsen et al. 2005). Recently, another class of genetic markers, single nucleotide polymorphisms (SNPs), has been used increasingly in population genetics and has proven useful in assessments of population structure (Morin et al. 2004), introgressive hybridization (Stephens et al. 2009;Finger et al. 2011), and pedigree reconstruction (Abadía-Cardoso et al. 2013). Though microsatellites and SNPs each have advantages and disadvantages in terms of cost, genotyping errors, polymorphism, etc., when a large number of both types of loci is available, this combination provides the most statistical power for understanding population genetic relationships (Narum et al. 2008).
Here we attempt to "unscramble" the population genetic structure of Central Valley O. mykiss using a combination of more than 100 microsatellite and SNP loci on a comprehensive set of Central Valley trout and steelhead populations. We compare these data with genotypes from a representative set of hatchery trout strains and coastal California steelhead populations (Aguilar and Garza 2006; Clemento et al. 2009;Garza et al. 2014

Sampling
Samples were taken from populations of O. mykiss at one or more locations in 15 tributary sub-basins of the Sacramento and San Joaquin rivers that drain the Central Valley ( Figure 1; Table 1), including locations both above and below barriers to anadromy in most tributaries. Most fish were captured using either electrofishing or hook-and-line capture techniques. Small pieces of caudal fin tissue were then excised and preserved through desiccation on blotter paper. Fish sampled in multiple years in the same location were combined for analysis, after verifying that they were taken from the same underlying population. These groups of fish are all referred to as populations for convenience and without additional assumptions about the biological details underlying this designation.

Genetic Data Collection
Nucleic acid extraction and microsatellite and SNP genotyping followed Arciniega et al. (2016). Genotypic data from 18 microsatellite loci were collected for all samples. This set of loci has been used in numerous previous studies of O. mykiss in California (Aguilar and Garza 2006;Deiner et al. 2007;Pearse et al. 2007Pearse et al. , 2009Pearse et al. , 2011aGarza et al. 2014). All samples were also genotyped with the panel of 96 SNP loci used by Abadía-Cardoso et al. (2013). The 96 SNPs include 95 loci from Aguilar and Garza (2008), Campbell et al. (2009), andAbadía-Cardoso et al. (2011), as well as an assay that includes a Y-chromosome marker developed by Brunelli et al. (2008) that identifies gender. All 96 loci were genotyped using 5ʹ nuclease TaqMan assays (Applied Biosystems) on 96.96 Dynamic Genotyping Arrays in the EP1 Genotyping System (Fluidigm Corporation). Two negative controls were included in each array and genotypes were called using Fluidigm SNP Genotyping Analysis Software v3.1.1.

Data Analysis
We combined the microsatellite and SNP data collected from the Central Valley O. mykiss populations with previously collected data from coastal California steelhead populations and hatchery trout strains commonly stocked in California. In analyzing these data, we first removed from most analyses three SNP loci that have been shown to be influenced by selection on life-history patterns in O. mykiss . Two of these loci in particular, SH121006-131 and SH114448-87, are in strong linkage disequilibrium (LD) with a genomic region on chromosome Omy5 that was recently found to be associated with resident and anadromous life-history in coastal California steelhead populations ). These two loci were analyzed separately to evaluate patterns of LD between them in Central Valley populations using the R package genetics (Warnes and Leisch 2005). Finally, we removed three microsatellite loci (OtsG401, Omy27, and Ots1b) and two SNP loci (SH127645-308 and SH128996-481) for which at least one of the population samples was not genotyped. Together, these removals left a total of 105 loci (15 microsatellites and 90 SNP loci), and we conducted all further population genetic analyses on this combined dataset. The gender identification locus was also excluded from the population genetic analyses.
We calculated expected heterozygosity (Nei 1987), observed heterozygosity, and number of alleles for each sample population, and estimated allelic richness (Ar) with the rarefaction method in the program HP-Rare (Kalinowski 2005) based on a sample of 25 gene copies. We quantified pairwise differentiation between all populations with Fst, using Weir and Cockerham's (1984) estimator, and assessed significance by the permutation algorithm in the genetix software package (Belkhir et al. 2004) with 100 replicates. We used a Mantel test implemented in the program ISOLDE of the GenePop software package (Raymond and Rousset 1995) to evaluate the correlation between genetic and geographic distance  Table 1. for the naturally spawning populations below barriers, using river distances separating the confluences of each major tributary along the mainstem of the Sacramento-San Joaquin River system.
We used two individual-based assignment methods to evaluate both recent gene flow among populations and to identify hatchery rainbow trout individuals among the naturally spawning populations. The first analysis, implemented in the model-based clustering program structure (version 2.2; Pritchard et al. 2000), was used to fractionally assign the genome of individual fish to a hypothesized number of genetic clusters, K, in the dataset and to identify population associations. This analysis did not use information about a priori population designations, so it truly assigns the ancestry of each individual fish without regard to its origin. We evaluated the data using a range of values of K = 2-14 to qualitatively document consistent patterns of population association. The second assignment analysis, implemented in the program gsi_sim (Anderson et al. 2008), uses the population genotype data as references to assign each individual fish to its most likely population of origin based on the method of Rannala and Mountain (1997). This approach evaluates the likelihood of assignment of each individual to every population, providing an evaluation of the composition of each population sample.
We constructed phylogeographic trees based on matrices of Cavalli-Sforza and Edwards' (1967) chord distance using the software package PHYLIP (v. 3.69c;Felsenstein 2005). This genetic distance was chosen because of its accuracy and ability to reliably recover the correct topology for phylogeographic trees (Takezaki and Nei 1996;Felsenstein 2003). We used the neighbor-joining algorithm (Saitou and Nei 1987) to determine tree topology, and derived a consensus tree from 1,000 bootstrap samples of the distance matrix with the CONSENSE program of PHYLIP. Finally, we conducted a correspondence analysis (CA) on the full dataset to qualitatively evaluate population relationships in the absence of a constrained tree structure. This analysis was conducted using the R-based software package adegenet 1.3-4 (Jombart 2008; Jombart and Ahmed 2011).

Individual-Based Analysis
The final dataset contained genotypes of 2,430 individuals from 51 sample groups, including 1,667 fish from Central Valley populations. Model-based assignments from the program structure over the range of K-values employed clearly identified hatchery rainbow trout sampled among the naturally spawned fish (Figure 2). This analysis was used to identify 14 hatchery-origin rainbow trout in the Upper Merced population sample, six in the Upper Stanislaus, and 11 sampled at Nimbus Hatchery. The large number of hatchery trout identified in the Upper Merced River (14 of 35, 40%) were all sampled on the same day, separately from the rest of the fish in that population sample, and likely represent a distinct group of planted hatchery trout. Hatchery rainbow trout identified with structure were removed from the dataset in all subsequent analyses, with the exception of fish in the Lower Merced River sample, which had a strong and uniform hatchery influence, so no individuals could be singled out for removal.
Individual assignment tests provided high accuracy of self-assignment to Central Valley O. mykiss populations. The overall accuracy of assignment to population of origin was 84.7% (Table 1). Assignment accuracy for individual populations ranged from 100% for the McCloud R.-Butcherknife Ck., Thomes Creek, and the Feather River-above-Lake-Almanor samples to 33% for the Feather River Hatchery stock, in which many fish assigned to the Mokelumne Hatchery, and vice versa. Similarly, a substantial number of individuals cross-assigned between the American River and Nimbus Fish Hatchery samples, reflecting the strong similarities between these groups of fish.

Population Genetic Diversity
Allelic richness within populations was strongly correlated for microsatellite and SNP loci (r 2 = 0.453, p < 0.001;

Population Structure
We examined pairwise values of Fst, the standardized variance in allele frequencies between populations, for patterns of population structure. All pairwise Fst values were significantly greater than zero based on permutation tests, with the highest values found between above-barrier populations (0.34, McCloud R., Butcherknife Ck. and Yuba River-Upper) and the lowest values involving below-barrier hatchery populations (0.005, Feather River Hatchery and Mokelumne Hatchery; 0.01, Nimbus Hatchery and American River) and below-barrier natural populations (0.015, Battle and Deer creeks). Notably, the lower Merced River sample was very similar to Eagle Lake trout, based on Fst (0.012) and other analyses (see below). Mean pairwise Fst values were significantly greater among above-barrier (0.15) than belowbarrier populations (0.07; t-test, p < 0.001), and for SNP loci (0.13) than for microsatellites (0.10; t-test, p < 0.001). Despite the potential for both historical and current gene flow, there was no significant isolation by distance among the 12 natural below-barrier samples (r 2 = 0.029, p > 0.05).
Phylogeographic trees were created for Central Valley populations only (Figure 3) and also with coastal California steelhead included (Figure 4). We also constructed trees using the microsatellite and SNP data separately, and with the hatchery rainbow trout strains included and excluded (data not shown).
Regardless of which populations were included, there were only minor differences in the relationships inferred in the different trees, and all the major, statistically significant, relationships were consistent with the trees shown in Figures 3 and 4. In general, the phylogeographic trees did not cluster populations by basin of origin, with little or no statistical support for most internal branching relationships. We found strong bootstrap support primarily for nodes joining pairs of population samples above the same barrier dam. For example, the relationships between the two upper American River populations-American-NF and American-MF, and the two upper Mokelumne River populations, Mokelumne-NF and Mokelumne-SFwere both strongly supported in all trees (Figures 3  and 4). There was also a well-supported association between the Upper Yuba (Pauley Creek), Upper Feather River (both samples), Eagle Lake hatchery strain, and Lower Merced River samples, which consistently clustered, even when Eagle Lake was excluded from the analysis. Among the below-barrier populations, the American River-Lower and Nimbus Hatchery samples were closely associated with strong bootstrap support in all trees, as were the Mokelumne River, Mokelumne River Hatchery, and Feather River Hatchery samples (Figures 3 and 4).
The phylogeographic analysis that included coastal California steelhead populations revealed that, in general, Central Valley O. mykiss populations, both above and below dams, are more closely related to each other than to coastal populations outside of the Central Valley. Similarly, all of the hatchery strains cluster with the Central Valley populations in those analyses, as expected, given that most strains of hatchery rainbow trout used in California were domesticated from Sacramento River tributary populations (Busack and Gall 1980). The reduced LD between the two Omy5 loci in the hatchery trout strains is also consistent with their Sacramento River basin origins (Table 1)

DISCUSSION
In contrast with the patterns typically found in natural populations, genetic analysis of Central Valley O. mykiss populations with more than 100 markers found a general lack of geographically associated population structure. This likely reflects more than a century of habitat modification and stocking/hatchery practices that together have altered the historical genetic relationships among O. mykiss populations in at least three ways. First, unlike the close relationships typically found between coastal O. mykiss populations above and below barriers within the same watershed (Clemento et al. 2009;Pearse et al. 2009), Central Valley populations separated by dams are usually not each other's closest relatives. Second, the relationships among below-barrier Central Valley populations do not fit a pattern of isolation-by-distance, as has been found among O. mykiss and other salmonid populations both within and among watersheds (Primmer et al. 2006;Palstra et al. 2007;Pearse et al. 2007;Pearse et al. 2011b;Garza et al. 2014), as well as in a recent study of Central Valley giant gartersnakes (Thamnophis gigas) inhabiting the same geographic area (Wood et al. 2015). Finally, some below-barrier Central Valley O. mykiss populations, particularly in the lower American River, are clearly derived primarily from populations from the northern California steelhead DPS, presumably though past importation of eggs from the Eel and Mad rivers. Like scrambling an egg, these genetic effects are largely irreversible, and future management must take them into account while recognizing that the historical relationships cannot be completely restored. However, such genetic effects are also not static, making efforts to use science-based recovery planning essential for the restoration of the adaptive potential of O. mykiss populations in the Central Valley (Meek et al. 2014).
Our results are largely concordant with previous genetic studies of Central Valley O. mykiss (e.g., Nielsen et al. 2005). However, the increased power of the combined microsatellite and SNP data used in the present study, as well as the inclusion of multiple stocks of hatchery rainbow trout and population samples above barriers to anadromy, offer increased resolution, especially given the complementary characteristics of these two types of marker (Narum et al. 2008). Nonetheless, unlike the well-supported relationships and strong isolation by distance found among coastal populations, there was only weak statistical support for most phylogenetic relationships among Central Valley O. mykiss populations. Thus, the lack of strong population structure found in this study likely represents an accurate depiction of the current population genetic relationships among Central Valley O. mykiss populations, while also showing that the overall genetic distinction between coastal and Central Valley DPS O. mykiss remains. Moreover, the majority of the genetic diversity found among the Central Valley steelhead / rainbow trout populations studied here was found at the level of the individual sample sites, all of which were significantly differentiated, contributing to high rates of selfassignment for most populations (Table 1). Accurate population self-assignments are useful because they indicate that the underlying genetic data can be used as a reference baseline for genetic stock identification techniques to determine basin and tributary of origin for individual fish in management or forensic applications (e.g., Seeb et al. 2007).
As noted above, one salient result of the present study is that populations above and below barrier dams in the same basins are not closely related in most of the major tributaries. Instead many of the above-barrier populations appear to be more genetically similar to each other than to any of the below-barrier populations, a pattern also observed by Nielsen et al. (2005). However, that study did not evaluate relationships between Central Valley trout and hatchery rainbow trout, leaving uncertainty about the phylogenetic origin of the above-barrier populations (Lindley et al. 2006). In the present study, most above-barrier populations are clearly genetically distinct from the hatchery trout strains, supporting the hypothesis that hatchery rainbow trout stocked in the reservoirs and elsewhere above dams in the region have not replaced the native O. mykiss populations that residualized following dam construction. Thus, our results suggest that native O. mykiss dominate the existing populations represented above the dams, as has been documented in coastal California basins (Clemento et al. 2009). However, it should be noted that detecting the influence of hatchery strains VOLUME 13,ISSUE 4,ARTICLE 3 is complicated by the close relationship of most hatchery trout strains to their Central Valley origins, and substantial past introgression by hatchery trout into some or all above-barrier populations can not be completely ruled out.
In several sub-basins, we sampled and analyzed multiple above-barrier populations and the results were not all consistent. For example, in the Kings River, samples from Deer Cove and Mill Flat creeks both showed some similarity to hatchery trout strains, and the two populations were closely associated in some, but not all, analyses. On the other hand, pairs of samples from different tributaries above Folsom and Pardee dams, in the American and Mokelumne rivers, respectively, were closely related in all analyses.
Pairwise FST values were very low, 0.03 and 0.04 between the middle and north forks of the American River and north and south forks of the Mokelumne River, respectively, and both pairs also cluster with high confidence in all phylogeographic analyses, indicating a common genetic ancestry and/or recent gene flow between them.
Artificial propagation of O. mykiss began in the Central Valley with the establishment of the Baird Station on the McCloud River in 1872. Since then, millions of juvenile fish have been released annually in Central Valley rivers, streams, lakes and reservoirs (Leitritz 1970). This massive propagation and stocking effort, much of it sparsely documented, significantly complicates efforts to disentangle historical population structure. Based on individual genotypic assignments, few hatchery trout were found amongst the population samples, with almost all identified hatchery trout sampled in three locations (Upper Stanislaus River, Upper Merced River, and Nimbus Hatchery). However, two populations showed significant associations with one or more hatchery trout strains. The population from Deer Cove Creek on the Kings River clustered with hatchery strains in some analyses, suggesting likely hatchery trout ancestry, even though no hatchery trout were identified individually. More strikingly, the sample from the Lower Merced River associated strongly with the Eagle Lake hatchery trout strain in both phylogenetic and correspondence analyses, as well as containing a significant number of individuals that assigned to the Eagle Lake strain.
Thus it appears that the fish sampled in the Lower Merced River are almost exclusively descended from this hatchery trout strain.
Introgression of hatchery rainbow trout into natural steelhead / rainbow trout populations and hatchery production is potentially detrimental, because of their reduced genetic variation, history of hatchery selection, and potential for a genetic predisposition against anadromy. Here, among the 31 sampled adults that entered Nimbus Fish Hatchery in 2005-2006, nine were identified as hatchery rainbow trout (Garza and Pearse 2008). These individuals were generally smaller than the steelhead, but there was significant overlap in the size distributions, suggesting that such fish might be mistaken for small steelhead and incorporated into the broodstock.

CONCLUSION
Our genetic results indicate small population sizes and reduced genetic diversity in above-barrier populations relative to below-barrier populations, consistent with the decreased connectivity and lost influx of new genes through migration after dam construction, factors that can contribute to population extirpation (Srikwan and Woodruff 2000). Facilitating fish migration across barriers is one way to mitigate such effects, and might also counteract adaptation of above-barrier populations in response to the strong selection against anadromy in these populations ). However, re-establishing connectivity of above-barrier populations trout with steelhead populations below dams should be carefully monitored because the consequences of such integration are not known, and could range from beneficial increases in genetic diversity and effective size, to negative changes in life history of the below-barrier populations, decreased fitness of hybrids, and adverse ecological interactions such as competition or direct predation.