Clinical and Biomarker Changes in Premanifest Huntington Disease Show Trial Feasibility: A Decade of the PREDICT-HD Study

There is growing consensus that intervention and treatment of Huntington disease (HD) should occur at the earliest stage possible. Various early-intervention methods for this fatal neurodegenerative disease have been identified, but preventive clinical trials for HD are limited by a lack of knowledge of the natural history of the disease and a dearth of appropriate outcome measures. Objectives of the current study are to document the natural history of premanifest HD progression in the largest cohort ever studied and to develop a battery of imaging and clinical markers of premanifest HD progression that can be used as outcome measures in preventive clinical trials. Neurobiological predictors of Huntington’s disease is a 32-site, international, observational study of premanifest HD, with annual examination of 1013 participants with premanifest HD and 301 gene-expansion negative controls between 2001 and 2012. Findings document 39 variables representing imaging, motor, cognitive, functional, and psychiatric domains, showing different rates of decline between premanifest HD and controls. Required sample size and models of premanifest HD are presented to inform future design of clinical and preclinical research. Preventive clinical trials in premanifest HD with participants who have a medium or high probability of motor onset are calculated to be as resource-effective as those conducted in diagnosed HD and could interrupt disease 7–12 years earlier. Methods and measures for preventive clinical trials in premanifest HD more than a dozen years from motor onset are also feasible. These findings represent the most thorough documentation of a clinical battery for experimental therapeutics in stages of premanifest HD, the time period for which effective intervention may provide the most positive possible outcome for patients and their families affected by this devastating disease.

There is growing consensus that intervention and treatment of Huntington disease (HD) should occur at the earliest stage possible. Various early-intervention methods for this fatal neurodegenerative disease have been identified, but preventive clinical trials for HD are limited by a lack of knowledge of the natural history of the disease and a dearth of appropriate outcome measures. Objectives of the current study are to document the natural history of premanifest HD progression in the largest cohort ever studied and to develop a battery of imaging and clinical markers of premanifest HD progression that can be used as outcome measures in preventive clinical trials. Neurobiological predictors of Huntington's disease is a 32-site, international, observational study of premanifest HD, with annual examination of 1013 participants with premanifest HD and 301 gene-expansion negative controls between 2001 and 2012. Findings document 39 variables representing imaging, motor, cognitive, functional, and psychiatric domains, showing different rates of decline between premanifest HD and controls. Required sample size and models of premanifest HD are presented to inform future design of clinical and preclinical research. Preventive clinical trials in premanifest HD with participants who have a medium or high probability of motor onset are calculated to be as resource-effective as those conducted in diagnosed HD and could interrupt disease 7-12 years earlier. Methods and measures for preventive clinical trials in premanifest HD more than a dozen years from motor onset are also feasible. These findings represent the most thorough documentation of a clinical battery for experimental therapeutics in stages of premanifest HD, the time period for which effective intervention may provide the most positive possible outcome for patients and their families affected by this devastating disease.

INTRODUCTION
Since 2001, Neurobiological Predictors of Huntington's Disease (PREDICT-HD; NS040068) has examined early indicators of disease in over 1300 participants at risk for Huntington disease (HD). Previous publications documented motor, cognitive, psychiatric, and imaging correlates of emerging disease (Paulsen et al., Frontiers in Aging Neuroscience www.frontiersin.org , 2008Duff et al., 2007;Biglan et al., 2009). The discovery that changes due to HD begin many years prior to the onset of diagnosable HD has fostered growing consensus that intervention at the earliest possible phase is desirable. Experimental pharmacologic interventions are currently being tested and methods to silence the polyglutamine gene expansion are underway (Zhang et al., 2009;Ross and Tabrizi, 2011). Significant limitations to preventive clinical trials for HD include a lack of knowledge of the natural history of the disease and a dearth of outcome measures sensitive to disease changes in the earliest, or premanifest, stages of the disease.
The aims of the study are twofold. First, we aim to document the natural history of premanifest HD progression in the largest cohort ever studied. Second, we aim to develop a battery of imaging and clinical markers of premanifest disease progression. Findings are interpreted in terms of their utilitarian value for preventive clinical trials. Outcome measures are scaled to facilitate comparisons, and hypothesized effect sizes are used to determine sample sizes for randomized clinical trials (RCTs). Finally, graphical analysis is used to illustrate the course of premanifest HD.

PARTICIPANTS
Data in this study were collected from September 2001 to August 2012 from N = 1314 PREDICT-HD participants (1013 with premanifest HD and 301 controls) at 32 worldwide sites. All participants had completed genetic testing for HD prior to (and independent from) study enrollment. Participants with >35 CAG expansion repeats in the HTT gene were cases and those with repeats <36 served as gene-mutation negative comparison participants (controls). Exclusion criteria included other central nervous system disease, injury, or developmental disorder, or evidence of an unstable medical or psychiatric illness. The research protocol was approved by each site's respective institutional review board and ethics committee, and all participants gave written informed consent and were treated in accordance with ethical standards.
Average years in the study (median) were six, with a range of 1-10. Over 75% of the sample had more than 3 years of data collected, 15% had 2 years, and <10% had 1 year. A subset of N = 204 gene-expanded participants received a motor diagnosis during the study (referred to as "converters"). Dropout was less than 5% per year. Sample size variation was due to a number of historic study design events: 1) the National Institutes of Health (NIH) grant that funded the study was renewed three times and participants were recruited for the duration of each individual grant; and 2) grant reviewers increased sample size on each renewal, so the total sample size increased as the length of possible study duration decreased.

PREMANIFEST STAGING GROUPS: CAG-AGE PRODUCT
Premanifest stages were based on a formula using genetic and demographic information to estimate proximity to HD diagnosis. The CAG-Age Product (CAP) score, computed as CAP E = (age at entry) × (CAG − 33.66) (Zhang et al., 2011), was derived from an accelerated failure time (AFT) model predicting motor diagnosis from age at entry, CAG length, and their interaction. CAP E is similar to the "disease burden" score of Penney et al. (1997) and presumably indexes the cumulative toxicity of mutant huntingtin. CAP E can also be used to estimate the 5-year probability of motor diagnosis. Cutoffs for groups were CAP E < 290 (Low), 290 ≤ CAP E ≤ 368 (Medium), and CAP E > 368 (High). The estimated time to diagnosis was, respectively, >12.78, 7.59-12.78, and <7.59 years. A dynamic (time-varying) CAP score was also used, denoted as CAP D , and computed with current age (rather than age at entry). CAP D can be interpreted as a type of CAG-adjusted age metric . Table 1 shows descriptive statistics for demographic variables by premanifest groups defined at study entry (i.e., based on CAP E ).

STATISTICAL ANALYSIS
The main analysis focused on change over time in each premanifest group controlling for covariates (age, education, gender, depressed mood severity, brain scanner field strength). Interest was in the comparison of premanifest and control groups. Using linear mixed effects regression (LMER) (Verbeke and Molenberghs, 2000), 39 variables of interest were analyzed separately. Detailed descriptions of the variables are provided in the Supplementary Material. To control for site-to-site variability, a three-level model was used with repeated measures nested within participants, nested within sites. A preliminary analysis not presented showed evidence that linear curves were adequate for the modeling of change over time stratifying on CAP E group. Random intercepts and slopes were specified for participants, as well as for sites. The time metric for the analysis was duration in the study (years in the study) with 0 = study entry. Two models were estimated for each outcome variable, a null model of duration and covariates only, and a full model adding CAP E group intercept and slope differences. Maximum likelihood (ML) methods were used for estimation, which yield unbiased estimates under the widely-applicable assumption that the missing data mechanism is ignorable (Little and Rubin, 2002). The two models (null, full) were compared using the likelihood ratio test (LRT), which evaluates the null hypothesis that two nested models are statistically equivalent. The LRT statistic can be treated as an effect size measure in this case because the degrees of freedom are constant for each model test, and the outcome variables were rank-ordered according to the LRT statistic.
To facilitate comparisons among the outcome variables, the estimated slopes from the LMER analysis were expressed in standard deviation (SD) units. To produce these estimates, each outcome was scaled using the grand mean and SD prior to the LMER analysis. Interest was in the comparison of premanifest and control groups. Z -tests of slope differences were computed as the estimated difference divided by its standard error. The control group Z -test was a test against a zero slope value. Three ancillary analyses were conducted. The first analysis examined possible effects of conversion. The second focused on required sample size for a hypothetical RCT. The third was a graphical analysis of trends over all progression periods for eight of the key variables. Variables were chosen conceptually to represent the phenotypic characteristics of HD (motor, cognitive, psychiatric) as well as to represent biological (imaging) and functional outcomes. Details of all analyses are presented in the Supplementary Material. Table 2 shows the LMER results. The variables with the largest effect sizes (LRT statistics) were imaging measures based on regional brain volumes (corrected for intra-cranial volume and controlled for change in field strength). The two top-ranked measures were the putamen and caudate structure volumes. The slopes for the controls showed significant decrease over time (consistent with normal aging), but the decline in the gene-expansion groups steadily decreased over time for all groups. The slope for each premanifest group was statistically different from the Control group (all ps < 0.001). Other imaging variables demonstrating significant change relative to controls included accumbens, cerebral spinal fluid (CSF), lobar gray, hippocampus, and lobar white (though only for the High group). The putamen, caudate, CSF, and lobar gray measures showed significant longitudinal change in all three premanifest groups. A graphical depiction of change in brain volume for the groups is shown in Figure S1 in Supplementary Material. Total motor score (TMS) from the Unified Huntington's Disease Rating Scale (UHDRS) showed the next highest effect size in change rates over time (rank four). The Control group slope was not statistically different from zero, and the Low group slope was not statistically different from the Control group slope. The Medium group slope was significantly larger than the Control slope, as was the High slope (all ps < 0.001). The next strongest effects were for bradykinesia (rank five) and chorea (rank six).

RESULTS
Decline in cognitive performance was significant in every measure examined. Symbol Digit Modality Test (SDMT) had the seventh-strongest effect over all measures. The Control group slope was positive and significantly greater than zero, indicating a practice effect. The Low group slope was negative and worse than the Control group slope. The Medium slope showed greater decline than the Low slope, and the High slope showed even greater decline. All cognitive measures examined showed significant changes in the High group, and 9 of 10 cognitive measures showed significant change in the Medium group. Four cognitive measures (SDMT, Stroop-word, Smell-ID, and the Trail Making Test) showed significant change over time in the Low group.
Regarding the functional variables, every measure examined showed significant change over time compared with the controls, except for the participant-rated World Health Organization Disability Assessment Schedule (WHODAS). The total functional capacity (TFC) scale (rank 20) showed the largest effect in the High group. The Control group slope and the Low group slope were not statistically different than zero. The Medium group slope showed significant decline, and the High group slope even more so. Though the Everyday Cognition Rating Scale (ECog) companion total had a weaker effect size (rank 26), the Low group was statistically different than the Control group, as were the higher progression groups. Four of the six functional measures showed robust change in the Medium group and three showed change in the Low group. The functional measures most appropriate for the different stages of premanifest HD varied. Whereas the TFC showed the greatest effect size in the High group, the companion WHODAS and ECog showed greater change in the Medium and Low groups.
Eight of the nine psychiatric variables showed significant change over time relative to controls. The SCL-90 obsessivecompulsive scale had the strongest effect (rank 22). There was an increase in obsessive-compulsive signs over time as the progression group increased (Control through High). Frontal Systems Behavioral Scale (FRSBE) executive and apathy subscales, ranked 24 and 25, respectively, also showed robust effect sizes. Seven of nine measures showed significant change in the Medium group and one psychiatric measure (FRSBE disinhibition) showed significant longitudinal change in the Low group. Table 3 shows the six variables that had a statistically significant acceleration of the slope for the participants who converted. The fourth column (acceleration) shows the value added to a slope in Table 2 to indicate the additional deterioration associated with conversion. The variables are sorted by proportionate increase, with dystonia having the largest change under conversion (approximately 2.1 times faster decline). The acceleration factor is most applicable for the High group because the largest proportion of conversion occurred in this group (see Table 1). The acceleration factors ranged from a mild added slope acceleration of 0.06 SD per year for TMS, to a strong acceleration of 0.20 SD per year for dystonia. Table 4 shows the single-group estimated sample size as a function of dropout percentage, effect size, and estimated parameters for a hypothetical RCT of efficacy consistent with guidelines for Frontiers in Aging Neuroscience www.frontiersin.org  Phase II trials (The Lancet Neurology, 2012). Listed in the effect size columns (20%, etc.) are the estimated required sample size. Results are listed for variables that required N /2 < 3000 for a 20% effect and 20% dropout. CSF had the smallest single-group sample size (e.g., N /2 = 27 for a 70% effect with no dropout), followed by putamen, caudate, TMS, speeded tapping, and additional variables. An approximate 70% difference in slopes was obtained in a recent clinical efficacy trial for HD (Huntington Study Group, 2006). Sample sizes do not take into account change associated with normal aging and thus may overestimate the effect sizes for differences between treatment versus placebo groups for any treatment that only addresses disease-related change. Figure 1 shows curves of individual participants (thin colored lines) and cubic spline curves (thick black lines) by premanifest stage. For some of the variables, change was slower for earlier premanifest stages but accelerated with proximity to average motor onset value (vertical line) with a greater acceleration thereafter. Figure 2 shows a model of disease progression for eight variables throughout the course of premanifest HD. The curves are based on a cubic spline fit after standardizing each variable relative to controls. The imaging, cognitive, and psychiatric variables showed linear increase over all premanifest stages, whereas motor and functional variables tended to show a non-linear trajectory with a sudden acceleration just prior to motor onset. Mean years in the study were 6, with a range of 1-10. Over 75% of the sample had >3 years of data, 15% had 2 years, and <10% had 1 year. A subset of N = 204 gene-expanded participants received a motor diagnosis during the study, referred to as "converters." Dropout was less than 5% per year. Sample size variation was due to a number of historic study design events: 1) the NIH grant that funded the study was renewed three times and participants were recruited for the duration of each individual grant; and 2) grant reviewers increased sample size on each renewal, so the total sample size increased as the length of possible study duration decreased.

DISCUSSION
Findings show longitudinal change in 36 of 39 measures examined over a 10-year natural observation study in premanifest HD. Effect sizes suggest a preventive RCT could be efficiently designed to detect treatment effects in the neighborhood of 30%, and effects similar to the recent tetrabenazine efficacy trial (about 70% effect) might be found with sample sizes near to 30 per arm, depending on the amount of dropout (Huntington Study Group, 2006). Significant measures include each of the clinical phenotypic characteristics of HD (motor, cognitive, psychiatric), as well as biologic and functional outcomes. No previous study has so thoroughly documented a clinical battery for experimental therapeutics in stages of premanifest HD. Current findings dovetail with those generated from smaller studies. Most importantly, the specific measure chosen for each of the primary components typically measured in HD is dependent upon the disease stage of the premanifest cohort targeted for intervention. For example, the best cognitive variable for a clinical trial in persons who are <8 years to motor onset (High group) is tapping speed because it had the largest significant absolute value standardized slope (=0.1476). On the other hand, the most robust measure for tracking cognitive change in persons who are more than a dozen years to onset (Low group) is smell identification (see Table 2).
It is important to note that actual design of clinical trials involves many factors. Trial design will need to balance feasibility of cost-effective, multi-site research with the expense of advanced technology methods that may limit resources and result in fewer treatments being evaluated (Ross and Tabrizi, 2011). Ideally, clinical trials for HD will involve intervention at multiple and differing points along the cascade of changes that are known to occur from gene expansion to patient suffering. Hence, interference with the disease processes could involve gene silencing, altering posttranslational modifications of Huntington, amelioration of gene transcription abnormalities, or buttressing metabolic abnormalities to thwart the devastation that people with HD and their families endure. Different outcome measures may also have different kinds of utility for Phase I versus Phase II trials or symptomatic versus disease-modifying strategies. It is important to keep in mind that the measurements listed in Table 4 do not take into account changes related to normal aging. Changes in CSF volume, for example, are significant for premanifest participants but also demonstrate relatively large effect sizes for controls. Thus, a treatment that targets disease-related change, but not age-related change, may be better evaluated with measures that are more sensitive to longitudinal differences between premanifest cases versus controls.
Much recent attention has been devoted to the importance of natural history studies in the design of clinical trials 1 . In HD literature, several authors have developed conceptual models of natural history of the disease. None of the models are based on actual data, however (see text footnote 1). Findings from this study were used to develop a natural history model that is based on up to 10 years of natural observation spanning a substantial range of the premanifest period. The model can be used to develop progression-based care guidelines as well as to design clinical trials. Importantly, these data can provide external historical controls facilitating single-treatment-group or dose-selection trials without an active randomized placebo group (Elm et al., 2005;Czaplinski et al., 2006).  with regulators. Such findings have immediate implications for current design of RCTs. A primary strength of the PREDICT-HD study is that over 1300 gene-mutation-tested participants were prospectively followed up to, and, for some, through the point of actual motor diagnosis.
Such data can be effective to document phenotypic and biologic changes that occur in persons with the gene expansion over the decade prior to, and, just after the manifestation of disease. Findings suggest the course of biologic progression in premanifest HD appears linear for imaging, cognitive, and psychiatric data, but non-linear for motor and functional data (see Figure 2). Motor expression appears to accelerate as the disease manifests over the course of approximately 15 years prior to motor onset. It is important to note that increases in the number and severity of clinical outcomes reflect measurement aspects of the manifestation of disease and do not necessarily reflect a curvilinear, or increased, rate of disease progression. One possible explanation may be that atrophy of each individual brain region (e.g., putamen as shown in Figures 1 and 2) proceeds relatively linearly, but as additional brain regions begin to undergo degeneration and dysfunction, their combined effect causes acceleration of the clinical expression of disease. Additional strengths include the large, worldwide collaboration among 32 sites and across brain scanners, cultures, and languages, involving multiple disciplines and specialties. The PREDICT-HD study may be most relevant to actual clinical trials where multiple sites are likely to be used to acquire sufficient sample sizes (The Lancet Neurology, 2012). Finally, the care with which the study was conducted provides quality control, quality assurances, standardization protocols, and statistical control for many common confounds (age, gender, education) and not-so-common confounds (depressed mood, field strength). All components of the study are shared worldwide to assure findings can be replicated and utilized in the future as more knowledge is acquired about HD and other neurodegenerative disorders, so PREDICT-HD data can continue to facilitate progress for decades onward.
Weaknesses also exist in the study. Interval to follow-up was 1 year for clinical and cognitive assessment and 2 years for imaging, and clinical trials demand more frequent assessments. It is recommended that the measures proposed be subjected to a brief repeated measures study over a period of 6 months to assure that more rapid assessment can be documented. Finally, due to the length of the PREDICT-HD study, some protocol changes were unavoidable. A common-but-important variation in our study was the change in MRI scanners from 1.5 to 3 T. The basal ganglia structures were processed to accommodate this change, and field strength was statistically controlled in the analysis. Solutions to varying field strength and other methodological challenges of PREDICT-HD can be made available to researchers who are interested in natural history studies. The lobar white matter measures reported in this work incorporate data from much more heterogeneous data collection than were reported in previous publications. The expanded subject inclusion for lobar white matter measures included both the 1.5 and 3 T scanners from a larger number of scanning sites. While the strength of white matter lobar measures is less than previously reported, we believe this is due to inherent measurement variability introduced by heterogeneous Frontiers in Aging Neuroscience www.frontiersin.org data collection. Systematically addressing the variability due to heterogeneous data collection is a focus of ongoing work.