Choosing Alzheimer’s disease prevention clinical trial populations

To assist investigators in making design choices, we modeled Alzheimer’s disease (AD) prevention clinical trials. We used longitudinal Clinical Dementia Rating Scale Sum of the Boxes data, retention rates, and the proportions of trial eligible cognitively normal participants age 65 and older in the National Alzheimer’s Coordinating Center Uniform Data Set to model trial sample sizes, the numbers needed to enroll to account for dropout, and the numbers needed to screen to successfully complete enrollment. We examined how enrichment strategies impacted each component of the model. Relative to trials enrolling 65 year olds, trials enriching for older (minimum 70 or 75) age required reduced sample sizes, numbers needed to enroll, and numbers needed to screen. Enriching for subjective memory complaints reduced sample sizes and numbers needed to enroll more than age enrichment, but increased the number needed to screen. We conclude that AD prevention trials can enroll elderly participants with minimal impact on trial retention and that enriching for older individuals with memory complaints may afford efficient trial designs.


INTRODUCTION
Clinical trials continue to target earlier stages of Alzheimer's disease (AD) because of concern that later intervention may not effectively slow progression due to established pathological burden (Sperling et al., 2011a). The earliest test of a potential intervention is through primary prevention trials that enroll volunteers with no clinical or biological sign of disease. Previous AD primary prevention trials encountered challenges related to slow enrollment, high screen failure rates, loss-to-follow-up, and fewer than expected cases of dementia (DeKosky et al., 2008, Sano et al., 2008, Meinert et al., 2009, despite strategies to enrich for age (DeKosky et al., 2008), family history of disease (ADAPT Research Group et al., 2007, Sano et al., 2008, or memory complaints . Trial designs that incorporate single continuous outcomes of global cognitive and functional performance, rather than time-to-event designs, may alleviate some of these challenges (Aisen et al., 2011, Richard et al., 2012 and have been endorsed by regulatory agencies for trials of those at greatest risk for AD dementia (Kozauer andKatz, 2013, Center for Drug Evaluation andResearch, 2013).
The Clinical Dementia Rating Scale Sum of the Boxes Score (CDR-SB) (Morris, 1993) measures within-patient clinical change assumed to represent brain disease, rather than normal aging (Morris et al., 1991), and has been proposed as a potential single primary outcome measure for use in predementia AD trials (Aisen et al., 2011, Kozauer andKatz, 2013). We used data from healthy control participants in the National Alzheimer's Coordinating Center Uniform Data Set (NACC UDS) to model AD trials that enroll cognitively normal participants and use the CDR-SB as a single outcome. We examined how enrichment strategies will impact the rates of trial retention and screen failure. We hypothesized that using higher minimum ages of enrollment and other enrichment strategies would reduce required sample sizes but would also increase the rates of screen failure and dropout.

Participants
The NACC UDS is a repository for longitudinal data collected from approximately 30 current or previously NIA-funded AD Centers nationwide (www.alz.washington.edu; Morris et al., 2006, Beekly et al., 2007. The UDS was initiated in 2005. These analyses examined data collected on or before December 1, 2012.

Study inclusion criteria
We examined the proportion of NACC UDS participants enrolled as cognitively normal healthy controls at baseline that was eligible for AD prevention clinical trial criteria and the criteria that most often resulted in ineligibility. To examine eligibility, we developed a set of inclusion criteria, adapted from previous AD prevention trials (DeKosky et al., 2008. Participants must have been enrolled as healthy control subjects, be age 65 to 90, score above 26 on the Mini Mental Status Examination (MMSE; Folstein et al., 1975) and have a global CDR score of 0. To permit accurate examination of long-term follow-up rates, only data from participants who had a baseline visit prior to June 1, 2008 (and thus were eligible for at least three annual follow-up visits) were included. Exclusion criteria were recent or active cardiovascular disease (e.g. heart attack, atrial fibrillation); presence of a pacemaker (since most AD trials include magnetic resonance imaging); medical conditions that might cause or contribute to cognitive impairment, including vitamin B12 deficiency, thyroid disease, alcohol or other substance abuse, Parkinson's disease, seizures, or traumatic brain injury; history of stroke; Hachinski ischemia scale score greater than 4; and geriatric depression scale (GDS) score greater than 6. For the medical conditions, patients were not excluded if the condition was characterized as remote or inactive. For vitamin B12 and thyroid deficiency, this was assumed to separate patients with a current active condition from those with a previous diagnosis adequately treated. The use of the following concomitant medications was exclusionary: lithium, anti-Parkinsonian medications, MAO-B Inhibitors, tricyclic antidepressants and other anticholinergic drugs (including diphenhydramine), stimulants (i.e. modafinil and methylphenidate), narcotic analgesics, first generation antipsychotics, atypical antipsychotics, anticonvulsants, and approved AD therapies.

Enrichment strategies
Age-We assessed the impact of limiting trial populations to those at least age 70 or 75.
Apolipoprotein E (ApoE) ε4 carrier status-ApoE genotype is a well-described genetic risk factor for AD (Corder et al., 1993). ApoE genotyping was performed locally at NACC ADCs or at the National Cell Repository for AD, primarily using blood samples. Subjects were divided into those who did and those who did not carry at least one ε4 allele.
Education levels-Demographic collection of information as part of the NACC UDS includes the highest level of educational completion for all participants. Education may serve as a surrogate for cognitive reserve and cognitive reserve may protect against cognitive decline (Stern, 2009). We enriched trial models by excluding those assumed to have the greatest cognitive reserve, those with a maximum education level greater than 16 years.
Subjective cognitive complaint-As part of the NACC UDS the clinician is asked to record whether the participant reports a decline in memory. The accompanying instructions in the UDS Coding Guidebook state that decline refers to cognitive changes in the subject's usual or customary memory function and that changes in behavior, motor, or other nonmemory symptoms should not be considered. We used this single item to categorize participants as having a subjective cognitive complaint.

CDR-SB
The CDR is an interview-based assessment tool. The researcher separately interviews an informant and the participant and assesses the participant's change relative to their premorbid (in this case earlier life) performance on six domains: memory; orientation, judgment and problem solving; community affairs; home and hobbies; and personal care. Each domain is scored as 0 (no dementia), 0.5 (questionable), 1.0 (mild), 2.0 (moderate), or 3 (severe dementia). Two overall scores can be derived, a global score using a standardized algorithm and a cumulative score summing the boxes. The CDR-SB is a well-described, validated, and reliable measure of change through the course of AD (Morris, 1993, Williams et al., 2009) and has been proposed as a suitable single outcome measure for AD trials in both dementia and predementia AD populations (Aisen et al., 2011, Coley et al., 2011, Cedarbaum et al., 2013, Kozauer and Katz, 2013.

Data analyses
We examined the mean decline in the CDR-SB at 36 months. Sample size estimates under an assumption of normality and known variance were calculated from an equation used frequently in the literature (Fox et al., 2000, Schott et al., 2010, Grill et al., 2013a: Here, z 1−β = 0.842 to provide 80% power; z 1−α/2 = 1.96 to test at the 5% level; Δμ is the mean change in CDR-SB score relative to baseline, multiplied by the drug effect (0.25) to reflect the estimated mean difference between placebo group change scores and drug group change scores; and σ is the SD of the change scores in the groups (assuming SD is the same in treatment and placebo groups). We report sample sizes per trial arm.
We calculated the retention rate after 36 months in NACC for each modeled population. Those who discontinued study participation, were lost-to-follow-up, or died during the 3year interval were considered to have dropped out of the study. Using the specific retention rate and the calculated sample size for each population, we calculated the number needed to enroll for a trial to maintain statistical power at completion. Finally, we examined the proportion of NACC participants who met eligibility criteria for each specific trial model. Using the rates of inclusion and the number needed to enroll, we calculated the number needed to screen.
To assist in the comparison of sample size estimates, we calculated the 95% confidence intervals (CI) for the sample sizes, numbers-needed-to-enroll, and numbers-needed-toscreen. These confidence intervals were estimated by using bootstrap resampling, calculating 10,000 iterations for each scenario. Formal statistical comparisons of model outputs were not performed.
Descriptive statistics (mean, standard deviation, and percentages) were calculated for eligible trial populations. The frequency of each reason for trial ineligibility was also calculated. Groups were compared by Chi square test (X 2 ), and Kruskal Wallis (KW) test, as appropriate. Age comparisons were performed on the mutually exclusive age epochs (i.e. 65-69; 70-74; ≥75). All analyses were performed using SAS 9.3 (Cary, NC) and R v2.14 (http://www.R-project.org, Accessed March 1, 2012).

Human subjects protection
Each participant provided written informed consent, approved by the local Institutional Review Boards at each participating AD Center.

Eligible participants
Data from 4,549 cognitively normal NACC participants were included in these analyses. Among subjects age 65 or older, 1,879 (41%) were deemed trial eligible. Among older participants, the proportion eligible was significantly lower; 39% of participants age 70 or older and 36% of those age 75 or older were eligible (p<0.001; Table 1). Older eligible participants were more often male, less often had a family history of AD, and were less frequently carriers of the ε4 allele of the ApoE genotype (Table 1). Older eligible subjects had worse scores on the MMSE but not the CDR-SB.
The reasons for trial ineligibility differed among the age groups (Table 2). Older patients were more often excluded for MMSE; the use of an FDA-approved anti-dementia medication or another excluded medication; a history of cardiovascular disease and stroke; scores on the Hachinski ischemia scale and GDS; and for a global CDR score greater than 0.

Dropout rate
Among trial eligible cognitively normal NACC participants, 36% dropped out before the 3year time point. We did not observe an increase in the rate of dropout with age (Table 3). Dropout was lower among APOE ε4 carriers and those with a subjective memory complaint.

Trial modeling
The mean change on the CDR-SB over three years for cognitively normal trial eligible NACC participants age 65 or older was 0.21 ± 0.86 (Table 4). To adequately power a 3-year trial to demonstrate a 25% drug effect in these participants using the CDR-SB, 4,137 participants/arm would be required (Table 5). Adjusting for trial retention, 6,424 participants/arm would need to be enrolled to maintain adequate power for a completed trial. Because only 41% of NACC cognitively normal participants met eligibility criteria (Table  6), the number needed to screen for such a trial would be 15,555 participants/arm, or just over 31,000 total volunteers.

Enrichment strategies
With increasing minimum age, decline measured by the CDR-SB over three years increased (Table 4), thereby reducing the number of participants needed to adequately power a trial (Table 5). Relative to the standard trial model enrolling participants age 65 or older, trials with a minimum age requirement of 70 and 75 years reduced the necessary trial sample sizes by 8% and 18%, respectively. Contrary to our hypothesis, we did not observe an increase in the dropout rate in the older populations (Table 3). Thus, the number needed to enroll was similarly reduced among the older age groups. Although the proportion of older cognitively normal participants who met trial criteria was reduced relative to younger groups, this difference was not great enough to counteract the other aspects of the trial model and the number needed to screen was also decreased in the older groups (Table 5).
Among the other enrichment strategies, limiting to those who had less than a college education did not improve the trial models (Table 5). Although carriers of the APOE ε4 genotype did not demonstrate greater decline over three years, reduced variance in this population (Table 4), resulted in a 39% smaller required sample size for this trial model, relative to a trial enrolling all eligible participants. Since the dropout rate was reduced in APOE ε4 carriers, the number needed to enroll (n=3,642/arm) was similarly reduced. In contrast, only 20% of cognitively normal NACC participants age 65 or older were APOE ε4 carriers. Thus, the number needed to screen for this trial model was 32,231/arm, an increase of 107%. Participants with a subjective memory complaint demonstrated a greater decline than the overall population on the CDR-SB over three years (Table 4) and the trial sample size model enriched for this population was reduced by 68% (Table 5). Because dropout in this group was reduced, relative to the overall population, the number needed to enroll for this model was reduced by 74%. Few NACC participants, however, presented with a memory complaint. Subsequently, the number needed to screen was 58% larger than that of the overall population.
In every model in which age was combined with another enrichment strategy, the sample size needed and the number needed to enroll were reduced, relative to the standard model accepting eligible participants age 65 or older (Table 5). These reductions ranged from a 7% reduction in the sample size for the model of minimum age of 70 years and exclusion of those with advanced education, to a 76% reduction in the number needed to enroll in a model of trial that enrolls only 75 year or old participants with a subjective memory complaint. As expected, the numbers needed to screen were increased for all models, ranging from a 72% increase in the number needed to screen for models of trials enrolling only APOE ε4 carriers who are at least 70 years old, to a 38% increase for trials that enroll only 75 year or old participants who have less than 16 years education.

Discussion
Studies such as this one can assist investigators in designing future predementia AD trials. This may include primary prevention trials enrolling volunteers with no biological or clinical sign of AD and secondary prevention trials of those at increased clinical (e.g. mild cognitive impairment) or biological risk for AD (see below). Previous primary (DeKosky et al., 2008, Sano et al., 2008, Meinert et al., 2009 and secondary (Feldman et al., 2007) AD prevention trials have faced challenges including fewer than expected cases of AD. One strategy to overcome this challenge is to move toward continuous cognitive (Salmon et al., 2006) or global outcomes (Aisen 2010). We modeled predementia trials that enroll cognitively normal participants and use the CDR-SB as a single continuous outcome measure.
Our study has several strengths. We used the observed rates of longitudinal change on CDR-SB, the three-year retention rates, and the rates of eligibility of the specific populations modeled to calculate the number of participants that would be necessary to adequately power trials, the number needed to enroll (factoring in observed dropout) and the number needed to screen (factoring in screen failures) for each varying design. To our knowledge, we are the first study to model AD trials of any type using both longitudinal change scores on primary outcome measures and retention rates of those same populations.
We found that the majority of cognitively normal research participants in the NACC UDS failed to meet eligibility criteria, suggesting that timely completion of trial enrollment in AD prevention trials may face significant challenges. Reasons for exclusion included factors that can be assessed via telephone screening (such as the presence of excluded medications), as well as factors that require in-clinic assessments (such as laboratory tests or clinical outcome measures). Of note, in each age cohort, more than 20% of participants were excluded for a global CDR score greater than 0, suggesting the presence of at least some clinical abnormality. Our modeled screen failure rates may be underestimations, as an increasing proportion of the age groups examined may present microvascular changes (Kantarci et al., 2013) that may preclude participation depending on the particular trial design and intervention tested (Sperling et al., 2011b). The NACC UDS does not collect information on white matter hyperintensities or microhemorrhages in cognitively normal participants.
As expected, we found that various enrichment strategies can improve trial sample size requirements. Contrary to our hypothesis, enriching trials for elderly (at least 70 or 75 years of age) participants was not associated with increased dropout rates and, therefore, models of such trials afforded improved requirements for not only sample size equations based on mean decline, but also models that account for participant retention. Results from at least one prevention trial support this finding (Meinert and Breitner, 2008). Furthermore, even though the older group had a lower rate of eligibility compared to younger populations, a model that accounted for the number needed to screen was still reduced in the older groups.
Among the remaining enrichment strategies, enrolling those with subjective memory impairment stood out as optimal. Regardless of age enrichment, subjective complaints reduced the required sample sizes by nearly 70%, and because this group had improved retention, the models of the number needed to enroll were reduced by roughly 75% in each age-defined model. Since dementia rates were not appreciably higher in an AD prevention trial that enriched for individuals with cognitive complaints , relative to trials using alternate strategies (DeKosky et al., 2008, Breitner et al., 2011, this may suggest that the use of a continuous outcome in our models was key to improved trial logistics. Indeed, our results, in line with some (Wang et al., 2004, Dufouil et al., 2005 but not all (Flicker et al., 1993, Smith et al., 1996 previous studies, suggest that individuals with cognitive complaints may be at greatest risk for immediate cognitive and functional decline. In comparison, enriching for APOE ε4 carriers reduced sample sizes by 39%-70%, depending on whether a combined strategy also enriching for age was utilized. The numbers needed to screen for trials enrolling participants with subjective complaints were increased by 54%-60%. These models may translate poorly to practice, as recruitment strategies for such a trial could simply target those with memory concerns. The true screen failure rate for such a trial would ultimately be based on the number of volunteers with a complaint who tested abnormally on neuropsychological and diagnostic criteria (e.g. meet criteria for dementia or mild cognitive impairment). For example, in our sample, more than 50% of otherwise eligible participants with subjective memory complaints were excluded for a global CDR score >0 (data not shown).
This study has limitations. The NACC UDS is not a population study so it is unclear how these results will translate into actual trials. Nonetheless, the cognitively normal participants in the UDS are enrolled at academic AD Centers, as was the case for many previous primary AD prevention studies. The models of the oldest age group were based on fewer data than those for the younger groups, especially when combining enrichment strategies (Table 5). This also may limit the external validity of these results. More sophisticated measures of subjective complaint exist and are being incorporated into ongoing multicenter studies such as the AD Neuroimaging Initiative (Saykin et al., 2006). Scales using multiple questions to assess memory and other cognitive domains may provide more sensitive assessments of change from premorbid function and it is unknown how incorporation of such scales might impact these results. Finally, secondary prevention trials of those with and without cognitive impairment, incorporating biological markers of AD as inclusion criteria are ongoing or imminent (Sperling et al., 2011a). The use of AD biomarkers presents potential logistical and ethical challenges associated with participant reluctance to undergo and learn biomarker testing status (Grill et al., 2013b). Biomarkers also add significant cost. The UDS does not currently collect such biomarker data. Though it is increasingly agreed that biomarkers are predictive of risk for future impairment , Roe et al., 2013, the utility of using these markers as inclusion criteria in predementia trials is not year clear (Schneider et al., 2010). Our results do not instruct how the applied strategies compare to biomarker strategies of trial enrichment.
In summary, most AD trial planning studies have used standard sample size calculations without considering the important factors of participant retention and screen failure rate. This AD prevention trial modeling study considered these variables and suggested that careful planning can afford investigators the opportunity to enrich trials for those at greatest risk for cognitive and functional decline, with no impact on trial retention and minimal impact on screen failure rates.   Table 4 Mean (SD) raw change in CDR-SB over three years.  Subject cognitive complaint 6.9 6.3 6.1