A genetic analogue of ‘mark–recapture’ methods for estimating population size: an approach based on molecular parentage assessments

Molecular polymorphisms have been used in a variety of ways to estimate both effective and local census population sizes in nature. A related approach for estimating the current size of a breeding population, explored here for the first time, is the use of genetic ‘marks’ reconstructed for otherwise unknown parents in paternity or maternity analyses of progeny arrays. This method provides interesting similarities and contrasts to traditional mark–recapture methods based on physical tags. To illustrate, this genetic method is applied to a population of painted turtles on the Mississippi River to estimate the number of successfully breeding males. Non‐genetic mark–recapture approaches were also applied to animals trapped at this location. Results demonstrate that such genetic data on parentage can be helpful not only in estimating contemporary population sizes, but also in providing additional information, not present in customary mark–recapture data, about possible extended movements of breeding individuals and the size of the pool of mates which they encounter.


Introduction
Molecular genetic markers have found wide application in population ecology and behaviour, ranging from assessments of genetic parentage and mating systems to explorations of deep phylogeographic and demographic history (Hoelzel & Dover 1991;Avise 1994Avise , 2000Carvalho 1998). With regard to estimating population sizes, at least two genetic approaches have been used. The first assesses the long-term, evolutionary effective size ( N e ) of a population by applying neutrality theory to observed standing levels of genetic variation (e.g. Soulé 1976;Nei 1983;Avise et al . 1988). The second employs observed temporal variances in allele frequencies across several generations to estimate a shorter term N e for a population (Waples 1989;Hedgecock et al . 1992;Scribner et al . 1997;Fiumera et al . 2000). Here we employ genetic markers in yet another, quite different, way to obtain a point-estimate of the contemporary or census number of breeding individuals in a population. Our approach, based on molecular paternity (or maternity) analysis of progeny arrays, is in many ways a genetic analogue of traditional mark-recapture methods (Seber 1982;Pollock et al . 1990) for estimating local population size, N , in any wildlife species. It is distinct, however, in that it samples only those individuals that have successfully reproduced, and thus estimates the effective number of breeders in a population ( N b ) rather than the census number, N .
Other 'genetic tagging' approaches (Palsbøll 1999;Mills et al . 2000), similar to photo-identification methods (Mace et al . 1994), have recently been used to estimate census size in a population (Palsbøll et al . 1997;Taberlet et al . 1997Taberlet et al . , 1999Kohn et al . 1999). Individual-specific DNA fingerprints are obtained noninvasively from, for example, faeces, shed hair or sloughed skin (without necessarily contacting the animal itself ). Each fingerprint is treated as a 'mark', and a 'recapture' is recorded whenever an identical genotype is found in two separate DNA samples from the wild. Population size can then be estimated using mark-recapture algorithms appropriate to the sampling design, based on the probability that a population of a given size and structure would yield the observed recapture rate (Palsbøll et al . 1997;Kohn et al . 1999;Mills et al . 2000).
Here we explore a significant modification of this general approach in which the molecular genetic 'tags' stem from parentage analysis (rather than from individual DNA samples). The genetic data consist of unique multilocus male genotypes, each reconstructed from the deduced paternal contribution to a female's clutch. Each genetic 'mark' is the first instance in which a male's genotype was deduced (i.e. 'captured') in a progeny array (clutch), and each 'recapture' represents any later finding of that same genotype in a clutch laid by another female. Appropriate mark − recapture analyses of such data can then be used to estimate the effective number of breeding males, N bm in the local mating pool (as first suggested in a different biological context by Jones & Avise 1997).
This approach is illustrated using extensive molecular paternity data from a natural population of painted turtles ( Chrysemys picta ). The genetic-based estimates of population size are then compared with those from traditional mark − recapture analyses of male turtles physically trapped in the same area. As elaborated later, these two sources of information probably reference different population pools (female mating partners for the genetic data; local males for the trapping data), so any differences between these estimates may themselves be illuminating about the biology of the species.

Physical samples
For the physical mark-recapture study, male painted turtles ( Chrysemys picta ) were captured using baited mesh traps in and near South Potter's Marsh, adjacent to a nesting area along the Mississippi River near Thomson, Illinois. Trapping was performed during May and June of 1999 and 2000. Males were individually marked prior to release by drilling unique combinations of small holes into their marginal scutes, using the same numbering pattern as employed previously to tag females in the population ( Janzen 1994).

Genetic samples
Blood samples were also taken from the trapped males. These were preserved in a blood lysis buffer (Seutin et al . 1991) and stored in liquid nitrogen for later genetic analysis. As part of a long-term study on temperaturedependent sex determination in painted turtles ( Janzen 1994), adult females and their clutches were sampled during 1995 -98 from a nesting population at South Potter's Marsh. Following egg-laying, aliquots of blood were drawn from these females and stored in blood lysis buffer in liquid nitrogen. Liver tissue, preserved in 70% ethanol, was obtained for hatchlings from a total of 155 clutches (mean = 5.8 hatchlings/clutch).
For our study, only one clutch per female per year was considered because female painted turtles typically store and utilize sperm between clutches within a year, and, hence, their progeny would not represent independent samples of sire genotypes over this time frame (Pearse et al . 2001). Where appropriate, we did include genetic data from multiple clutches laid by the same female in different years. However, female painted turtles are also capable of storing sperm across years (Pearse et al . 2001). Thus, in cases in which evidence of sperm storage was found, only the first-year clutch from that female was treated as a 'capture', and subsequent clutches from that same female were disregarded.
DNA was isolated from blood by either proteinase K digestion followed by standard phenol/chloroform extraction, or by boiling for 15 min ≈ 75 µ L of blood/buffer with 100 µ L of extraction buffer (100 m m NaCl, 0.5% sarkosyl) and 100 µ L of 20% Chelex. After boiling, each sample was centrifuged for 2 min and 0.7 µ L was used for the polymerase chain reaction (PCR). From liver tissue, DNA was isolated using the above Chelex protocol, and 1-2 µ L of the resulting extract was used for PCR.

Molecular markers
Three microsatellite loci developed for C. picta (Pearse et al . 2001) were amplified either singly or in multiplexed PCR reactions, and the products were separated and analysed using ABI 377 genotypers and associated software. As a measure of our ability to identify individual turtles by the uniqueness of their multilocus genotypes, probability-ofidentity (PI) values were calculated following Paetkau & Strobeck (1994) and Paetkau et al . (1995). A PI value is the overall probability that two individuals drawn at random from a given population share identical genotypes at all assayed loci. As a further measure of our power to uniquely identify individuals, we also calculated PI sib , which describes the probability that full siblings share an identical multilocus genotype (Waits et al . 2001).

Paternity analysis
Initially, all mothers and samples of offspring from their respective clutches were genotyped at two loci, Cp2 and Cp3 , in a single multiplexed reaction. Paternal alleles were deduced by subtracting the known dam's genotype from all hatchling genotypes in a given clutch. Based on these results, clutches were classified as single paternity (if only one or two paternal alleles per locus were identified), multiple paternity (more than two paternal alleles identified at both loci) or unclear (if the two loci produced inconsistent results or the data were otherwise ambiguous). In these latter cases, a third locus, Cp10 , was genotyped to clarify paternity within the clutch. Similarly, if a genotypic match was found at Cp2 and Cp3 for the deduced fathers of two clutches, Cp10 was also then amplified, and the full threelocus genotype of the father was reconstructed to confirm (or refute) the match.
Paternity results (father genotypes) from samples analysed for this study were combined with those from a previous study of the same population (Pearse et al . 2001) to provide the maximum available number of independently reconstructed paternal genotypes.

Statistical estimation of N
Male population size at South Potter's Marsh was estimated using three nonexclusive subsets of the data, and an appropriate statistical model or models in each case, as follows.
'Data set 1' involved male turtles physically marked in 1999 and recaptured in 2000 (two sampling periods). 'Data set 2' involved 25 trapping days during May and June 2000, divided (for computational reasons) into five, 5-day trapping periods. 'Data set 3' encompassed the parentageanalysis approach, wherein for four sampling periods (1995 -98) the 'marks' and 'recaptures' were of males whose genotypes were deduced as the sires of assayed clutches (see also Table 1).
Estimates of population size based on data set 1 were calculated by hand using the Petersen method (following Pollock et al . 1990), as this is the appropriate model for these data. The computer program capture (which assumes a closed population) was employed in the analysis of both other data sets (White et al . 1978;White & Burnham 1999).
In capture , the models M h -jackknife and M h -Chao were employed because both permit heterogeneity in trapping probabilities and are fairly robust to low recapture rates (Chao 1988;Mills et al . 2000), conditions that probably apply to this study. Results from the null model in capture , M o , are also provided. For the genetic parentage data (data set 3), the program mark was also utilized to estimate population size under the Jolly Seber model. This model assumes an open population, which is appropriate given that these data were collected over a 4-year period.

Molecular markers
The molecular markers used in this study are highly variable (20 -27 alleles per locus, mean expected heterozygosity = 0.87), and, thus, suggest a high degree of confidence that any genetic match between two deduced multilocus genotypes truly earmarks one unique male as the sire of both clutches. All three loci segregated normally and independently (i.e. did not show deviations from Hardy-Weinberg equilibrium; Pearse et al . 2001). The mean probability of identity (PI) for the loci considered individually was 0.028, and PI was 0.000017 for the three loci combined. However, these calculations assume the absence of close genetic relatives within the sample, an assumption that if violated would increase the overall likelihood of occasional genetic matches among separate sires (Waits et al . 2001). The probability of identity for full siblings (PI sib ; Evett & Weir 1998) provides a conservative confidence limit for multilocus genetic matches, and a value of PI sib < 0.05 has been used as a cutoff for wildlife identification purposes ( Woods et al . 1999). For this study, the combined PI sib value for the three loci used was 0.033. Thus, both the PI and PI sib values provide strong support for the use of these microsatellite loci for individual identification. As a further validation of the power of these microsatellite markers, it is important to note that among more than 230 adult males and females analysed (26 335 pairwise comparisons), in only one case were two individuals empirically observed to share the same multilocus genotype. Two males (one trapped in 1999 and the other in 2000) share a genotype whose anticipated frequency was 3.5 × 10 -5 (a value higher than the mean because the alleles present in these males were relatively common in the population).
The observation of a genetic match between two individuals is not particularly surprising, given the results of Waits et al . (2001) and the observation by Scribner et al . (1993) that recruitment in painted turtles tends to occur as groups of sibs. Nevertheless, this single observation of a genetic match does raise an issue as to whether some of the deduced matches for sire genotypes of two clutches might represent different males that happen to share an identical three-locus genotype. If such spurious matches are in fact present in the data, then most of them should arise when two individuals share high-frequency alleles, such that their matching genotypes would be relatively common. Figure 1 demonstrates that this is not the case. Instead, the observed matches show no tendency to involve highfrequency genotypes. Thus, there is no support for the hypothesis that the genetic-sire matches in this study fail to represent 'recaptures' of the same individual.
A problem sometimes encountered in microsatellite studies is nonamplification of some alleles (Pemberton et al . 1995). The presence of such 'null' alleles makes heterozygous individuals appear to be homozygous, and can create confusion in assigning paternity. Although all three loci appeared to be in Hardy-Weinberg equilibrium (Pearse et al . 2001), at least one instance of a null allele may be present in our data set at locus Cp10 : the parentage of one clutch can best be explained by assigning a null allele to the mother and some of her apparently homozygous offspring.

Paternity analysis
A total of 133 deduced paternal genotypes was included in the final data set, plus the genotypes from 40 males trapped directly in 1999, and 107 trapped in 2000. In the paternity-based analyses of population size, we treated each female as a 'trap', and deduced the paternal genotype (the 'trapped' male) from the offspring genotypes. In some cases, multiple paternity of a clutch (i.e. multiple males 'caught in the same female trap') interfered with our ability to reconstruct the paternal genotype(s) given our limited clutch sizes, and one or both of the males had to be excluded from the analysis. This is equivalent to an individual escaping from a conventional physical trap, or not being trapped at all, and should not bias our results in any way. In addition, three clutches were excluded because the putative dam's genotype was not consistent with all of the hatchlings in the clutch.
Multilocus genotypes for the nine deduced paternal genetic matches in the study (each reconstructed from the clutches of more than one female) are presented in Table 2, along with the expected Hardy-Weinberg frequency of each such genotype in the population. In two such cases (males A and E, Table 2), the males appear to have sired clutches from three different females. In addition, male E, as well as two other males (H and I), each sired two clutches laid by different females within the same year (Table 2). Because each year is treated in our analysis as a single trapping session, these three within-year 'recaptures' could not be included in the capture-recapture analysis. In three of the sire genotypes in Table 2 (D, G and I), the genotype at one locus could not be precisely determined, and we settled in favour of the most likely allele in each case. Note that if this inference was incorrect for any of these genotypes, the corresponding number of genetic 'recaptures' would be reduced and the resulting estimated breeding population size, N bm , would be larger and have wider associated confidence limits.

Estimates of population size
Estimated values of local male population size, N m , and numbers of male breeders, N bm , as well as 95% confidence intervals for all data sets and models, are given in Table 1. The 'physical' estimates of local population size ranged from 309 to 481. Various estimates of the number of male Open bars: genotypes of captured males that matched a paternal genotype reconstructed from a clutch. There is no evident tendency for the 'match' genotypes to be in high-frequency genotypic classes (as might otherwise be expected if such matches instead reflected the spurious sharing of genotypes by captured males and the true sires). breeders, N bm , based on our paternity-analysis approach ranged from 309 to 1193. Estimates of N from all data sets were associated with wide confidence intervals due to the low recapture rates.

Discussion
Our primary goal here has been to explore a novel application, the estimation of local population size, for molecular data from conventional genetic parentage studies. The approach is analogous to traditional markrecapture analyses, except that in this case it is the realized breeding population of males that is in effect sampled and resampled by females (rather than by physical traps), and the marks are genetic tags provided by nature (rather than applied directly by the researcher). These differences, as well as other distinctions between 'physical', 'direct genetic' and 'genetic parentage-based' approaches to mark-recapture analysis, are summarized in Table 3.
The genetic parentage-based approach should be especially well suited to any polygynous (or polyandrous) species in which groups of full sibs and one parent are easily captured, yet the other adult gender is difficult or impossible to sample directly. Such cases abound in the biological world. Obvious examples include marine turtles (where only females come ashore to nest), fish species with extended uniparental care of offspring (Breder & Rosen 1966), and many species of mammals and birds in which females alone (or, rarely, males) may remain physically associated with their young.
In the initial genetic study of this sort, Jones & Avise (1997) used reconstructed parental genotypes to estimate the population size of adult females in a local population of the dusky pipefish, Syngnathus floridae . In these and other Syngnathid species, pregnant males carry the developing embryos. The authors deduced the mothers' genotypes for 41 clutches of embryos borne by known fathers. Genotypes of these 'male-captured dams' were then compared with those of 43 adult females trapped by hand from the same location. Eleven physically trapped females possessed genotypes that matched reconstructed maternal genotypes from the assayed broods. Interpreting these as 'recaptures' in Petersen analysis (Seber 1982), Jones & Avise (1997) estimated that about N = 138 adult pipefish females were present in the population.
The analysis by Jones & Avise (1997) represents just one of several possible ways to use genetic parentage data in conjunction with mark-recapture statistics. As we have shown here, it is also possible to use the parentage data themselves as both marks and recaptures, provided that the correct time-series of samples is obtained. The resulting data may be appropriate for multiple sample capture − recapture estimators, such as the Schnabel or Jolly Seber methods for closed or open populations, respectively (Seber 1982). This approach has the advantage that there is never a need to trap individuals of one sex (in this case, males), because both the capture and recapture data on sires (in this case) are reconstructed from the other parent and the known progeny arrays. The only limitation is that enough young (generally at least five) must be available from each clutch to allow reliable reconstruction of the genotypes of the nonsampled parents.
For several reasons, the absolute values of N (Table 1) in this study of painted turtles should not be interpreted too literally. First, traps were placed in only a small portion of South Potter's Marsh, so the physical mark-recapture data probably underestimate the actual number of males in the area. Second, some empirical studies of populations of known size have suggested that mark-recapture methods tend to yield minimum estimates of N (Koper & Brooks 1997). Third, the wide range of N estimates produced, even by different models within the same computer program  213 1995, 1996, 1997 2.78 × 10 -5 B 206/216 136/196 212/238 1997, 1998 3.48 × 10 -7 C 206/206 141/145 208/214 1997, 1998 1.53 × 10 -5 D 202/220 141/157 214/230* 1995, 1998 5.63 × 10 -6 E 206/222 141/157 214/230 1997a, 1997b, 1998 1.94 × 10 -5 F 200/204 145/157 214/214 1995, 1997 6.07 × 10 -8 G 186/204 145 †/176 189/223 1997, 1998 2.19 × 10 -9 H 214/228 138/178 228/244 1998a, 1998b 9.51 × 10 -9 I 208/214 ‡ 157/190 212/214 1998a, 1998b 3.45 × 10 -8 *The father of the 1995 clutch has either allele 207 or 230. Allele 230 is 3.5 times more frequent. †The father of the 1997 clutch has either allele 145 or 172. Allele 145 is 9 times more frequent. ‡The father of clutch 1998a has either allele 186 or 214. Allele 214 is 4.7 times more frequent. (i.e. capture ), suggests that subtleties of the data analysis may greatly influence the value of N obtained. Finally, in a potential bias that could operate in the reverse direction, statistical models which assume that a population is closed (see footnotes in Table 1) may produce, if this assumption is violated, higher estimates of N than would otherwise have been the case (Lindeman 1990). Nonetheless, at face value the size of this painted turtle population as estimated from physical trap data and from the pure genetic mark-recapture approach agree in general magnitude (Table 1). As mentioned above, however, the populations referenced by the two methods could, in principle, be quite different, depending on movement patterns and mating behaviours of males and females (Table 3). Physical traps for the conventional mark-recapture analysis were placed only in South Potter's Marsh, and, hence, were probably accessible only to local resident males (thus, the resulting estimates are of the census number of local males, N m ). The genetic samples, by contrast, are representative of the (potentially different) collection of male turtles from which females selected mates (N bm ).
Overall, higher estimates of population size (and wider 95% confidence limits) generally emerged from the genetic approach (Table 1). This is somewhat surprising given that the physical samples included some subadult males, whereas the genetic samples could only include reproductively mature individuals. One possible interpretation is that in acquiring mates, female painted turtles may have travelled beyond the local population or otherwise mixed with resident males of neighbouring breeding areas. It should also be noted that the genetic data estimate the effective number of breeding males, and this value will be influenced by factors such as the variance in male reproductive success (which when high would produce an estimate of N bm that is low relative to the true N m ), or the migration of mate-searching females to neighbouring populations (which would produce an estimate of N bm that is higher than the local N m ). Thus, although this approach has not yet been explored in detail, it has the potential to provide biological information about natural populations beyond the estimation of population size per se.
The genetic and physical data for the painted turtles could also be used in a mixed analysis to provide an additional capture−recapture estimate of population size (as done by Jones & Avise 1997 for pipefish). In our case, however, combining data sources would appear to be less appropriate because of the multiyear time span involved and the distinct possibility that the two methods sample different pools of males.
Genetic data, particularly from parentage studies, have found many applications in behavioural ecology, such as in examining mating patterns in hybrid zones, illuminating the effects of pollinator behaviour on reproduction in plants or elucidating individual variation in reproductive  Seber 1982;Pollock et al. 1990Palsbøll et al. 1997Taberlet et al. 1997Taberlet et al. , 1999Kohn et al. 1999 success (Schnabel 1998). We have demonstrated an additional application of parentage data to a fundamental recurring challenge in wildlife biology -estimation of effective breeding population size. Although beyond the scope of this study, this parentage approach could be extended to estimate other demographic parameters as well, such as annual survivorship or population growth rate.