Predicting death in female Drosophila.

We have previously described a phenomenon called the death spiral that is characterized by a rapid decline in female fecundity 6–15 days prior to death in Drosophila . To carry out destructive physiological analyses of females in the death spiral would require a method to reliably classify individual females via the prediction of their age at death. Using cohorts of Drosophila we describe how to use the observed mortality prior to some target day and a female’s fecundity 3 days prior to the target day to determine if the female is in the death spiral. The method works at all ages and although the method does not result in perfect classiﬁcation, with sufﬁcient sample sizes any physiological trait whose means differ between the groups can be detected.


Introduction
There are three important stages of life from the perspective of evolutionary biology (reviewed in Rose et al., 2006;Shahrestani et al., 2009). The first is the developmental period, prior to reproduction. During this stage, natural selection works with maximum efficiency to weed out genetic variants that reduce survival before the onset of reproduction, since any individual that fails to survive this period will have zero fitness. This does not guarantee survival, even under optimal conditions, because of mutations, segregational genetic load, and developmental accidents. But it does mean that this stage of life is the primary beneficiary of natural selection for enhanced survival.
The aging phase is the period following the onset of reproduction. The well-developed theory of selection in age-structured populations (Charlesworth, 1994) shows that selection becomes progressively weaker with advancing age. As a result, evolutionary biology predicts progressively decreasing age-specific fitness components, even under ideal conditions, a pattern that is in turn readily ''tunable" by changing the timing of the onset and fall in the force of natural selection .
The third and final stage of life, according to evolutionary theory, has been called ''late life" (e.g. Rauser et al., 2006;Rose et al., 2006). At these advanced ages, age-specific selection is either weak or absent, such that it does not favor the enhancement of age-specific survival or fecundity characters. However, this lack of natural selection is uniform at these later ages, with no late-life age being selected more or less than any other. Thus evolutionary theory predicts an approximate plateau in late-life fitness components (Charlesworth, 2001;Mueller and Rose, 1996;Rauser et al., 2006), a pattern that has now been widely observed (Carey et al., 1992;Curtsinger et al., 1992;Rauser et al., 2006;Rose et al., 2002;Vaupel et al. 1998).
In a large-scale study of age-specific patterns of female fecundity in Drosophila, we discovered a fourth life-cycle phenomenon which we call the ''death spiral" Rauser et al., 2005). For a period of 6-15 days prior to death, the fecundity of females that are about to die drops at a much faster rate than the fecundity of similarly aged females that are not about to die. This result was found by comparing the slopes of the line describing fecundity vs. age as a function of the prospect of death for individual females. This decline in fecundity shortly before death was in turn incorporated into models that accurately describe the agespecific fecundity of D. melanogaster . The death spiral is detectable across a wide range of adult ages; it may signal a very general decline in physiological health prior to death. The death spiral has also been independently documented in D. melanogaster by other laboratories (Rogina et al., 2007).
Phenomena similar to the death spiral have been observed in other organisms. Christensen et al. (2008) monitored the physical and cognitive abilities of 2262 Danish individuals all borne in 1905. Over the course of the study the individuals were between 92 and 100 years of age. They found that the physical and cognitive scores of a group of individuals that died within two years of the measurements were significantly lower than the scores of similar aged individuals who did not die. Similarly, male medflies will often be found on their backs prior to death although if the fly can right itself it continues more or less normal behavior (Papdopoulos et al., 2002). This supine behavior appears to also be a reliable signal of impending death.
There are a host of interesting questions about the process of dying that could be addressed, if it were possible to reliably identify the females that have entered the death spiral prior to their actual death. It is reasonable to suppose that, if fecundity is undergoing a dramatic decline prior to death, then other aspects of physiology may also be changing dramatically. Since many physiological assays in Drosophila and other species are destructive, it will not always be possible to collect this physiological data immediately prior to the death of a test female. This inability will limit the study of the process of dying.
What is needed is a technique for determining which females are in a death spiral at any time. While the differences between large groups of death spiral and non-death spiral females may be pronounced, it is not clear whether or not individual females can be reliably classified as being in a death spiral or not. Here, we develop and test methods for identifying individual females that have entered the death spiral. We experimentally show that reliable classification is indeed possible, thus opening the prospect for effective functional, physiological, and behavioral studies of the process of dying in individual organisms.

Experimental population
This study used an outbred laboratory population of Drosophila melanogaster that had been selected for mid-life reproduction. The CO 1 population employed is one of the five replicate CO populations derived in 1989 from five corresponding O populations (Rose, 1984). These populations have long been cultured using females 28 days of age (Rose et al., 1992) and had been maintained at population sizes of at least 1000 individuals for at least 170 generations prior to the present study. Late-life mortality-rate plateaus and late-life fecundity plateaus have been studied in these CO populations (Rauser et al., 2005Rose et al., 2002). A large cohort of flies from the CO 1 replicate population was used in each of the three assays that we analyze here.

Culture and assay methods
The flies used in the assays were raised for two generations as larvae in 5 mL of standard banana-molasses food at 25°C, constant light, and densities between 60-80 eggs per 8-dram vial. During this controlled-density rearing, flies were kept on a two-week generation time.
For each replicate assay, individual females were housed with two males in vials containing charcoal-colored medium and 5 mg of yeast. Fecundity was first measured at age 12 days from egg. Assays one (''CO 1-1 ") and two (''CO 1-2 ") started with 1,111 females and twice as many males, to insure that all females were mated, while assay three (''CO 1-3 ") started with 606 females and twice as many males. The three replicate assays were temporally staggered to reduce the large amount of work required in measuring daily fecundity for such a large number of females. Over all three cohorts, we collected lifetime daily fecundity data for 2,828 females, with 3,169,101 eggs counted in total.
During the assays, we transferred flies to fresh yeasted vials daily and counted the number of eggs laid for each female until she died. Male flies were recombined between vials as they died, to ensure a constant supply of mates for females.

Statistical theory of classification: mortality
Suppose we have a cohort of flies aged t days, which we will call the ''target age." At the target age, we would like to separate females into two groups, those that are in the death spiral and those that are not in the death spiral. To be more specific, we consider a female in the death spiral if she is expected to die on day t + 1, t + 2, . . . , t + m, where the age-increment m is the maximum length of the death process. Based on our previous estimates of the duration of the death spiral in Drosophila, v could range from 5 to 14 days, for a female who enters the death spiral at day t.
Since it is more likely that flies well into the death spiral will exhibit altered physiology compared to females that have just begun the death spiral, we have set m = 5 for the data analysis that follows.
We regard this as a conservative assumption. This assumption also allows that the female may be in the death spiral for several days prior to the target day, and her fecundity should reflect this. This means that some females in our experimental data that have started their death spiral would be mistakenly classified as nondeath spiral females since they die at an age >t + 5. However, it is much less likely that a female that would die within 5 days of the target age would not be classified in the death spiral.
In the absence of any information about female fecundity, we could still use the survival of flies prior to the target day to estimate the chance of a fly dying over the next 5 days. We expect that experiments designed to measure the physiology of death spiral females would collect flies at ages well before a mortality plateau, in which case survival might be accurately predicted by the Gompertz equation (Gompertz, 1825;Mueller et al., 1995). Under this model the chance of dying in the 5 days following the target age (P) would be, where p t is the chance of surviving to age-t. The probability of surviving to age-t, is given by the Gompertz equations as, where A is the age-independent Gompertz parameter and a is the age-dependent parameter. We have used Eq.
(1) to predict the fraction of the population in the death spiral for the three CO 1 populations ( ages (small values of P), when the fraction of the population in the death spiral is small, the predictions from the Gompertz equation are very accurate. At more advanced ages (large values of P), the predicted number of deaths at each age tend to be larger than the observed numbers. We know that the increase in mortality rate will slow down at advanced ages relative to the Gompertz predictions, and thus the observed over-estimates are expected at old age. We have also fit the logistic Gompertz (Mueller et al., 2003) to these data, and that model results in consistent underestimates (results not shown) of the fraction of females in the death spiral.
Obtaining estimates for the Gompertz equations is easier and requires fewer parameters than the logistic Gompertz. Therefore, we employ the Gompertz model in the remainder of this article.

Phenotypic measures of differentiation
In principle, we could use the fecundity of individual females to determine the slopes of their individual regressions of fecundity on age. Alternatively, we could simply use the mean fecundity for several days prior to the target day to develop a phenotype score for each individual female. In either case, we expect that the distribution of these phenotype scores will be different for females that are in the death spiral vs. those that are not in the death spiral. Suppose we re-scale the data by subtracting the mean and by dividing by the standard deviation. In this case, the phenotype scores of all females at age-t can be plotted as shown in Fig. 2. Based on our previous results, we would expect the mean fecundity and the slope of female fecundity of spiral females to be below the values of the non-spiral females. We let the difference in these means on the standardized scale be d.
In this section we describe results from an investigation of the CO 1-1 population. We did not use the other CO 1 populations, since our interest is to test these methods on data sets that had not been used to develop the methods. In other words the observations from CO 1-2 and CO 1-3 can serve as a form of cross-validation to give an unbiased assessment of the utility of these techniques. We first investigated the slope of female fecundity with age. We found that this phenotype did not produce sufficient separation of the spiral and non-spiral female populations (data not shown). The sparse data for each female and the fact that regression requires that we estimate two parameters (both slope and intercept) probably all contributed to poor performance.
We then focused on the mean fecundity based on a range, 1-10, of days prior to the target age. To determine the optimal number of days prior to the target age to use for the purpose of estimating this mean accurately, we estimated the standardized difference in the mean of d for several time intervals prior to the target age, as well as for several other ages. These standardized differences we calculated from females of the same age that had been divided into the two groups: spiral and non-spiral females. Thus, any differences can not be attributed to age related effects. The results (Fig. 3) show that, except for the oldest age studied (41 days), we get the maximum separation (d ffi 0.8) for females about 2-3 days before the target age. We settled on 3 days as a reasonable guideline for future work.
We have taken data from the CO 1-1 population and computed the scaled fecundity of spiral and non-spiral females at four different ages (Fig. 4). These distributions confirm the previously noted differences in mean fecundity although there is also clearly overlap in these distributions.

Statistical theory of classification: mortality and fecundity
Information from a cohort survival records allows us to predict with some accuracy how many females should be in the death spiral. However, with this information alone the only way to use this information would be to randomly chose the appropriate number of females for each group, e.g. those in the death spiral and those not in the death spiral. Information on female fecundity gives useful information for making more precise predictions about which females to put in each category. More formally we assume that at the time of assignment we have a group of N females that have unique integer identifiers or id numbers G = (1, 2, . . ., N). Our goal is to separate this collection of N females into two non-overlapping groups, G s = {g s 2 (1, 2, . . . , N)} and G ns = {g ns 2 (1,2, . . . , N)} where G s are the death spiral females and G ns non-spiral females.
Based on our observations of female fecundity the only reasonable expectation is that females in the death spiral will show reduced levels of egg production. Consequently, we next describe a method for classifying females that does not rely on knowledge of the probability distribution of female fecundity or even that the probability distribution is constant.
At any particular age, let the total number of live females be N, the predicted number of spiral females is, N s = PN, and the number of non-spiral females N ns =(1-P)N. For each of the N females, suppose that we rank them based on the total number of eggs laid prior to the target day such that w 1 has the greatest number of eggs and w N the smallest. Corresponding to this ordered vector of three- day fecundities is a vector of female identities for each ordered fecundity value, Y = {y i 2 G}. The vector Y can be used to identify the female at each rank position, e.g. the female with the greatest fecundity has id number y 1 the female with the smallest fecundity has id number y N . A non-parametric method for assigning females to these groups would assign females with egg counts w 1 , w 2 ,. . .,w Nns to the nonspiral group and the remaining females to the spiral group. If we wanted to improve the success rate of this method we could eliminate females in the middle of the distribution. The group membership of females with intermediate values of fecundity is most difficult to classify, e.g. see Fig. 4. This leads to a generalization of the previously described method. Let L = round[k(1ÀP)N] and U = round[kPN], where round[x] is the integer obtained from rounding the real number x to the nearest integer. The parameter k is between 0 and 1. The original method corresponds to k = 1. When k is less than 1 a fraction of the females in the middle of the observed egg distribution are being discarded which will presumably increase the accuracy of classification. Then the non-spiral females are those with egg counts, w 1 , w 2 , . . ., w L-1 while the spiral females have egg counts w NÀU + 1 , . . ., w N or, G ns ¼ ðy 1 ; y 2 ; . . . ; y LÀ1 Þ ð 3aÞ G s ¼ ðy NÀUþ1 ; y NÀUþ2 ; . . . ; y N Þ ð 3bÞ 2.6. Evaluating the success of each method

Scaled Phenotype
An important application of these techniques will be the separation of females prior to death into two groups: spiral and non-spiral. These groups can then be subjected to various measurements to determine if mean differences exist between spiral and non-spiral females. Suppose the two groups do differ and that the mean of some trait of interest in the spiral females is Z s and the mean for the non-spiral females is Z ns : If our methods are unable to do any better than simply randomly choosing females, then we would expect to be unable to detect these mean differences even if they exist.
Suppose that, as a result of applying one of these methods, the fraction of females classified as spiral that were in fact classified correctly is f s . Let f ns be the corresponding fraction of correctly classified non-spiral females. Then the mean of the group of females that have been classified as spiral by these methods would bẽ Z s ¼ f s Z s þ ð1 À f s Þ Z ns . Likewise the mean for the non-spiral females would beZ ns ¼ f ns Z ns þ ð1 À f ns Þ Z s . The mean difference between the non-spiral and spiral trait values is then, We call D the classification success. If D is 1, the classification has been perfect. If D is 0, then we have done no better than randomly guessing group membership.
To study the utility of these methods we examine the accuracy of these predictions across several variables. These include (i) three different data sets, CO 1-1 , CO 1-2 , and CO 1-3 , (ii) four different values of k, 1.0, 0.75, 0.5 and 0.25, (iii) three different cohort sample sizes, 1000, 500, and 100, and (iv) six different adult ages, 20, 25, 30, 35, 40, and 45 days. To accomplish this we have used bootstrap resampling to calculate the relative success of these methods (Efron and Tibshirani, 1993). For each combination of the four variables Age 41 days Fig. 4. The scaled mean, three-day fecundity for spiral and non-spiral females from the CO 1-1 population. The mean of the spiral female distribution is indicated by the left arrow and the mean of the non-spiral females by the right arrow.
1000 bootstrap samples were generated to estimate classification success. Let a data set be X ¼ ðx 1 ; x 2 ; . . . ; x N Þ, where each element of the dataset, x i , is a vector valued random variable consisting of an age at death, and daily egg counts up to the day of death. From this data set we sample with replacement a bootstrap sample, . . . ; x Ã m Þ, where m = 100, 500, or 1000. Using X * and Eqs. (3a-b) bootstrap classifications, G Ã ns and G Ã s , were determined. In addition two additional sets of females,G Ã ns andG Ã s , of equal size to G Ã ns and G Ã s were generated by choosing members at random without replacement from X * .
G Ã ns was used to compute the fraction of females classified as non-spiral that were correctly classified, f Ã ns . In a similar manner we calculated f Ã s and for the randomly classified females,f Ã ns andf Ã s . Classification success was quantified with the statistic, Since the expected value of (f Ã ns +f Ã s ) is 1, EðD Ã Þ is similar to D in equation (3c). The main difference between D and D * is that it is possible due to chance random samples for D * < 0.

Results
In general we have found that the classification success parameters, f Ã ns and f Ã s , depends on P. If P is very small then most females are non-spiral and there is little difference between f Ã ns andf Ã ns , that is a random sample is likely to include a large fraction of non-spiral females. On the other hand when P is small there are few spiral females and thus most random samples will not include any and we tend to observe a large difference between f Ã s andf Ã s . In Fig. 5 the average value of D * for all three data sets at N = 1000 are shown. We see that generally predictive success increases with smaller values of k, although at times this advantage is small (CO 1-2 , Fig. 5) and is occasionally reversed (age 45 for CO 1-3 , Fig. 5). Since the behavior of the three different data sets does not differ dramatically we focus on the CO 1-3 in the next few figures.
The pattern of age-specific change of the average D * does not change much with sample size (Fig. 6). However, the variance of D * scales with the sample size (Fig. 7).
We can see all three trends in Fig. 8 which displays the distribution of D * for the CO 1-3 population at age 20. At both densities we see the mean increase with decreasing values of k (Fig. 8). However, the variance is clearly greater at N = 100 compared to N = 1000. Additionally we see that at N = 100 there are a small fraction of samples where D * < 0 (Fig. 8). In these samples guessing resulted in better success than our formal prediction method.

Discussion
Practically the use of the methods described here to classify females would be best used when P was about 0.5. This would result in about equal numbers of females in both groups and thus the greatest statistical power to detect phenotypic differences. For the three populations CO 1-1 , CO 1-2 , and CO 1-3 the ages at which P = 0.5 are 35, 40, and 32 days respectively. At these ages the classification of success ranges from around 0.18 to 0.31 (Fig. 5). This means that any phenotype that truly differs between the spiral and non-spiral females could be detected using this classification scheme although larger sample sizes are needed relative to a technique that results in perfect classification.
Suppose that it would take a sample size M from the spiral and M from the non-spiral females to detect a significant difference for some character, assuming perfect classification. If the classification success for this sample is D, then we would need a sample of size of roughly (D) À2 M from each female type to detect a significant difference. As an example, if perfect classification would require only 25 females from each group to detect a significant difference for some trait, then if our classification success is only 0.2 we would need a sample of 625 females to detect the same difference. If the classification success is 0.3 then the sample size needed is 278.
Likewise, if we use a value of k less than 1 then the classification success should improve and hence we can use a smaller total sample of females. Of course by setting k < 1 we are reducing our total sample size to k2M (assuming M in each group). Reducing k below one will only be beneficial, in the sense of increasing the statistical power, if the classification success improves by a factor ffiffiffiffiffiffiffiffi 1=k p : As an example, if the classification success is about 0.2 when using all the data then it would pay to use only half the data (e.g. set k = 0.5) if the classification success improved by a factor of 1.41 or to about 0.28.
This previous calculation would seem to suggest that the benefit of using k < 1 would be easiest to achieve when the classification success is low. However, from the results in Fig. 5 it appears that when the classification success is low it is less likely that using k < 1 will substantially improve the classification success. At this time we would suggest caution using less than the entire data set.
It may be possible to improve these techniques by collecting additional information. traits may decline in the death spiral. It could turn out that some other trait will give more reliable predictions or that using fecundity with some other trait would increase the classification success. Only additional empirical work will help determine if any of these possibilities exist. The observation of the dramatic decline in Drosophila female fecundity has only recently been described (Rauser et al., 2005). But its confirmation by independent laboratories (e.g. Rogina et al., 2007) suggests that it is a robust phenomenon. However, some aspects of the death spiral may not be robust. For instance Rogina et al. observed that in addition to the decline in fecundity, all females in their study laid no eggs during the last days before death. Our study showed a decline in fecundity, but many of our females continued to lay eggs up to the day before they died (Rauser et al., 2005). Since both the Rogina and Rauser studies were very large, these different observations are likely to be due to differences in culture techniques or genetic backgrounds of the flies used, rather than merely statistical sampling error.
It seems unlikely that female fecundity could undergo such a dramatic decline and other important physiological processes would be unchanged. Even in Drosophila, there are some behavioral traits that can be measured without killing individual flies. Thus, it is possible to measure activity or mating propensity and then wait until the measured flies die in order to determine their status in the death spiral. However, other characters, like lipid levels or RNA expression, require that females be sacrificed prior to death. For this latter type of assay, the techniques described in this paper would be useful.
An important concern in human health is the period of disability that sometimes occurs prior to death (Crimmins, 2004;Manton and Gu, 2001;Verbrugge and Jette, 1994). Disability in humans reduces quality of life and often requires additional expense from the health care system (Manton and Gu, 2001). In addition, disability may predispose individuals to other disorders or initiate a progressive decline proceeding inexorably to death (Verbrugge and Jette, 1994). Gaining a greater understanding of factors that initiate disability, or ameliorate its effects thus could have great practical importance.
The death spiral in Drosophila may be viewed as a model of disability prior to death. With additional research, we could develop a more complete understanding of the ensemble of physiological traits involved, the timing of these events prior to death, and whether these events can be affected by environmental interven- tions, such as diet, mating status, or specific types of natural selection. With greater understanding of the physiological and behavioral characters that are affected by the death spiral in females, we might also be able to devise ways to study this phenomenon in males.