Identifying Changepoints in Biomarkers During the Preclinical Phase of Alzheimer’s Disease

Objective: Several models have been proposed for the evolution of Alzheimer’s disease (AD) biomarkers. The aim of this study was to identify changepoints in a range of biomarkers during the preclinical phase of AD. Methods: We examined nine measures based on cerebrospinal fluid (CSF), magnetic resonance imaging (MRI) and cognitive testing, obtained from 306 cognitively normal individuals, a subset of whom subsequently progressed to the symptomatic phase of AD. A changepoint model was used to determine which of the measures had a significant change in slope in relation to clinical symptom onset. Results: All nine measures had significant changepoints, all of which preceded symptom onset, however, the timing of these changepoints varied considerably. A single measure, CSF t-tau, had an early changepoint (34 years prior to symptom onset). A group of measures, including the remaining CSF measures (CSF Abeta and phosphorylated tau) and all cognitive tests had changepoints 10–15 years prior to symptom onset. A second group is formed by medial temporal lobe shape composite measures, with a 6-year time difference between the right and left side (respectively nine and 3 years prior to symptom onset). Conclusion: These findings highlight the long period of time prior to symptom onset during which AD pathology is accumulating in the brain. There are several significant findings, including the early changes in cognition and the laterality of the MRI findings. Additional work is needed to clarify their significance.


INTRODUCTION
Accumulating evidence indicates that the underlying neuropathological mechanisms associated with Alzheimer's disease (AD) begin a decade or more before the emergence of mild cognitive impairment (MCI) (Sperling et al., 2011). This has led to an increasing interest in understanding the order and magnitude of biomarker changes during this 'preclinical' phase of AD.
A hypothetical model has been proposed describing the order in which biomarkers change across the spectrum of AD (Jack et al., 2013). It has, however, been challenging to effectively test this model since most longitudinal studies that have enrolled cognitively normal individuals and collected relevant measures have limited follow-up. Additionally, studies with limited followup tend to lack a sufficient number of clinical outcomes (i.e., number of cases who progress to MCI) and therefore have limited power for statistical analyses designed to determine the timing of biomarker changes during preclinical AD.
Such analyses are feasible using data from the BIOCARD study, in which participants were cognitively normal when first enrolled, a wide range of informative measures were collected at baseline, and some participants have now been followed for over 20 years. The availability of these measures when the subjects were cognitively normal, and the unusually long duration of follow-up, allows the examination of the timing of biomarker changes during preclinical AD.
The primary goal of the analyses described here was to identify changepoints in measures based on cerebrospinal fluid (CSF), magnetic resonance imaging (MRI), and cognitive testing, obtained from a cohort of cognitively normal individuals, a subset of whom subsequently progressed to the symptomatic phase of AD. This study was approved by the Johns Hopkins Medicine Institutional Review Board.

Study Design
The BIOCARD study, the study from which these data were drawn, was initiated at the National Institutes of Health (NIH) in 1995. While at the NIH, subjects were administered a neuropsychological battery and clinical assessments annually. MRI scans, CSF, and blood specimens were obtained approximately every 2 years. The study was stopped in 2005 for administrative reasons and re-established at Johns Hopkins University (JHU) in 2009, at which point the annual clinical and neuropsychological assessments were reinitiated. Bi-annual collection of CSF and MRI scans was re-established in 2015, and the acquisition of positron emission tomography (PET) scans using Pittsburgh Compound B (PiB) was begun. Tau PET imaging was initiated in 2017 (see Figure 1 for a schematic representation of the study design). This paper is based on CSF and MRI data collected during the 1995-2005 period and neuropsychological tests during the 1995-2013 period.
Qualified researchers may obtain access to all de-identified clinical and imaging data used for this study.

Selection of Participants
Recruitment was conducted by the staff of the Geriatric Psychiatry branch of the intramural program of the National Institute of Mental Health. At baseline, all participants completed a comprehensive evaluation at the NIH, consisting of a physical, neurological and psychiatric examination, an electrocardiogram, standard laboratory studies, and neuropsychological testing. Individuals were excluded from participation if they were cognitively impaired or had significant medical problems such as severe cerebrovascular disease, epilepsy or alcohol or drug abuse.
A total of 349 individuals were initially enrolled in the study, after providing written informed consent. By design, approximately 75% of the participants had a first degree relative with dementia of the Alzheimer type. The analyses presented here are based on data from 290 subjects who were cognitively normal at baseline and had complete observations on the baseline variables of interest. Subjects were excluded from analyses for the following reasons: (1) subjects had not yet re-enrolled in the study or had withdrawn (n = 29); (2) Subjects were below 40 years old at the beginning of study (n = 20). Not all biomarkers were available for every subject and the actual number of subjects actually used for each run of the model was smaller: 256 for CSF, 270 for MRI and 281 for cognitive tests, for which we also excluded subjects who only had one battery of tests, to allow for a more reliable practice effect correction (see Statistical Analysis).
Of the 290 subjects included in these analyses, 209 subjects remained cognitively normal at their last visit and 81 subjects were diagnosed with MCI or dementia due to AD by the time of their last visit. The demographic characteristics of the subjects in the analysis are shown in Table 1, which are similar to the characteristics of the cohort as a whole. Most of the subjects who became symptomatic over time still meet criteria for MCI and all but a very small number (n = 3) have a clinical diagnosis consistent with AD. Follow-up of the cohort is continuing and the goal is to get autopsies on as many participants as possible. The accuracy of the clinical-pathological diagnoses has been 92% to date.

Consensus Diagnostic Procedures
Clinical and cognitive assessments were completed annually at the NIH initially and subsequently at JHU, as noted above. A consensus diagnosis for each study visit was established by the staff of the BIOCARD Clinical Core at JHU (prospectively for subjects evaluated starting in 2009 and retrospectively for subjects evaluated at the NIH). This research team included: neurologists, neuropsychologists, research nurses and research assistants. During each study visit, each subject had received a comprehensive cognitive assessment and a Clinical Dementia Rating (CDR), as well as a comprehensive medical evaluation (including a medical, neurologic and psychiatric assessment). For the cases with evidence of clinical or cognitive dysfunction, a clinical summary was prepared that included information about demographics, family history of dementia, work history, past history of medical, psychiatric and neurologic disease, current medication use and results from the neurologic and psychiatric evaluation at the visit. The reports of clinical symptoms from the CDR interview with the subject and collateral source (e.g., spouse, child, friend) were summarized, and the results of the neuropsychological testing were reviewed.
The diagnostic process for each case was handled in a similar manner. Two sources of information were used to determine if the subject met clinical criteria for the syndromes of MCI or dementia: (1) the CDR interview conducted with the subject and the collateral source was used to determine if there was evidence that the subject was demonstrating changes in cognition  in daily life, (2) cognitive tests scores (and their comparison to established norms) were used to determine if there was evidence of significant decline in cognitive performance over time. If a subject was deemed to be impaired, the decision about the likely etiology of the syndrome was based on the medical, neurologic, and psychiatric information collected at each visit, as well as medical records obtained from the subject, where necessary. More than one etiology could be endorsed for each subject (e.g., AD and vascular disease). One of four possible diagnostic categories was selected at each visit for each subject: (1) Normal, (2) Mild Cognitive Impairment, (3) Impaired Not MCI or (4) Dementia. The decision about the estimated age of onset of clinical symptoms was determined separately, and was based on responses from the subject and collateral source during the CDR interview regarding approximately when the relevant clinical symptoms began to develop. These diagnostic procedures are comparable to those implemented by the Alzheimer's Disease Centers program supported by the National Institute on Aging.
The estimated age of onset of clinical symptoms was based primarily on a semi-structured interview with the subject and the collateral source. The staff conducting the consensus diagnoses were blinded to the CSF and imaging measures.
Within the context of this study, the diagnosis of Impaired Not MCI typically reflected contrasting information from the CDR interview and the cognitive test scores (i.e., the subject or collateral source expressed concerns about cognitive changes in daily life but the cognitive testing did not show changes, or vice versa, the test scores provided evidence for declines in cognition but neither the subject nor the collateral source reported changes in daily life).

Selection Criteria for Variables Included in the Analyses
The changepoint analyses presented here include variables from the three primary domains evaluated in the BIOCARD study, obtained when subjects were first enrolled. These domains include: (1) cognitive test scores, (2) CSF values, and (3) MRI measures. In order to be as parsimonious as possible, we based the selection of which specific variables should be included in the analyses on findings from prior publications  that examined each of these measures in relation to time to onset of clinical symptoms. A total of 9 measures were included, as described below.

Cognitive Assessments
The annual, comprehensive neuropsychological battery covered all major cognitive domains, including memory, executive function, language, visuospatial ability, attention, speed of processing and psychomotor speed (see Albert et al., 2014 for the complete battery). We selected four cognitive measures to include in the changepoint analyses, as these four measures were significant in the multivariate Cox models examining the association between baseline performance and time to onset of clinical symptoms: (1) Digit Symbol Substitution Test from the Wechsler Adult Intelligence Scale -Revised; (2) Logical Memory -delayed recall from the Wechsler Memory Scale -Revised; (3) Verbal Paired Associates -Immediate recall from the Wechsler Memory Scale -Revised; and (4) Boston Naming Test.

CSF Assessments
Cerebrospinal fluid specimens were collected over time at the NIH (1995-2005) but were later analyzed at a single point in time by investigators at JHU. The CSF specimens collected from the participants were analyzed using the xMAP-based AlzBio3 kit [Innogenetics] run on the Bioplex 200 system. CSF specimens were analyzed in triplicate on the same plate. The AlzBio3 kit contains monoclonal antibodies specific for Aβ1-42 (4D7A3), t-tau (AT120), and p-tau181p (AT270), each chemically bonded to unique sets of color-coded beads, and analyte-specific detector antibodies (HT7, 3D6). Calibration curves were produced for each biomarker using aqueous buffered solutions that contained the combination of the three biomarkers at concentrations ranging from 54 to 1,799 pg/ml for synthetic Aβ1-42 peptide, 25-1,555 pg/ml for recombinant tau, and 15-258 pg/ml for a tau synthetic peptide phosphorylated at the threonine 181 position (i.e., the p-tau181p standard). Each subject had all samples (run in triplicate) analyzed on the same plate. The intra-assay coefficients of variation (CV) for plates used in this study were: 7.7% ± 5.3 (Aβ 1−42 ); 7.1% ± 4.9 (t-tau); 6.3% ± 4.8 (p-tau 181 ). Interassay (plate-to-plate) CVs for a single CSF standard run on all plates used in this study were: 8.9% ± 6.5 (Aβ 1−42 ); 4.7% ± 3.3 (t-tau), and 4.3% ± 3.18 (p-tau 181 ). Compared with studies using the same kits and platforms, our absolute results are at the median levels for Aβ 1−42 , t-tau, and p-tau 181 . The CVs, plate-to-plate variability, and the dynamic range of our assays are well within published norms (Mattsson et al., 2009;Shaw et al., 2009).

MRI Assessments
The MRI scans acquired from the participants were obtained using a standard multi-modal protocol with a GE 1.5T scanner. The coronal scans employed an SPGR (Spoiled Gradient Echo) sequence (TR = 24, TE = 2, FOV = 256 × 256, thickness/ gap = 2.0/0.0 mm, flip angle = 20, 124 slices). The scans were processed with a semi-automated method, using region-of-interest large deformation diffeomorphic metric mapping (ROI-LDDMM) techniques (Miller et al., 2013). More precisely, the MRI volumetric regions of interest (ROI) included the entorhinal cortex, hippocampus, and amygdala. For each of the three ROI, landmarks were placed manually in each MRI scan to mark the boundaries of the ROI, following previously published protocols [see Csernansky et al. (1998) and Miller et al. (2013) for the hippocampus, Munn et al. (2007) for the amygdala, and Miller et al. (2013) for the entorhinal cortex]. Next, a group template for the entorhinal cortex, hippocampus, and amygdala was created, based on the set of baseline MRI scans. The same set of landmarks was placed into this group template as in the individual subject scans. ROI-LDDMM procedures were then used to map the group template to the individual subject scans, using both landmark matching (Csernansky et al., 2000) and volume matching (Beg et al., 2005). The resulting segmented binary images for the entorhinal cortex, hippocampus and amygdala were used to calculate the volume of each structure, by hemisphere, by summing the number of voxels within the volume.
A medial temporal lobe composite was used in the present analyses, based on an average of the entorhinal cortex, hippocampus and amygdala. [Prior analyses showed that this composite is more strongly associated with CSF alterations that are an early marker of AD than the individual MRI measures taken separately (Gross et al., 2017).] The measurements from the right and left hemisphere were examined separately.
The volumetric measurements of the entorhinal cortex, hippocampus and amygdala were normalized for head size by including total intracranial volume (ICV) as a covariate (Sanfilipo et al., 2004). ICV was calculated using coronal SPGR scans in Freesurfer 5.1.0 (Segonne et al., 2004).

Statistical Analysis
Overview The overall goal of the changepoint analyses was to determine if each of the measures selected for analysis had a significant changepoint in relation to time to onset of clinical symptoms and, if so, the timing of these changepoints with respect to one another. The model used in these analyses has previously been applied to MRI data in this cohort in order to establish the order in which changes occur in the volume, thickness and shape of medial temporal lobe regions during preclinical AD (Younes et al., 2014). A more advanced version of the model (Tang et al., 2017) is applied here to the full range of biomarkers available in the study.
The changepoint is represented in the model as a significant change in slope (see Figure 2). The model uses all of the available data (both from subjects who remained normal as well for those who progressed to MCI) in order to estimate the changepoint. The main features of the model are as follows: 1. Time is measured relative to the clinical onset time of the disease (even though age is included as a covariate). This means that if a subject has been diagnosed with MCI 10 years after another, the time scale for the latter is shifted 10 years to the right compared with the former. 2. Clinical onset times for normal subjects, which are not observed, are treated as missing data, therefore assuming "right censoring." The model therefore assumes that every subject will ultimately get the disease if they were to live indefinitely. A prior model of disease onset is used in conjunction. 3. The model assumes linearity in the measures as a function of age, with the change of slope at the changepoint. It is slightly different from the sigmoidal model of Jack et al. (2013), in which biomarkers smoothly transition from a low-abnormality plateau to a high-abnormality plateau in an S-shaped curve. Since the data in the study pertain to individuals who were cognitively normal at baseline and remained normal or who progressed from normal to MCI, the model assumes that the subjects are either in the low-abnormality range or in a transition phase, thus not requiring adaptation to an S-shaped function which might allow for two changepoints. 4. All models included age and gender as covariates and a constant random effect. Education was included as an additional covariate for the cognitive measures; intracranial volume and left-handedness was included as an additional covariate for the MRI measures. 5. We also corrected for the impact of a practice effect on cognitive tests (Rabbitt et al., 2004;Zehnder et al., 2007;Vivot et al., 2016) by introducing a covariate that depends on the number of tests taken in the past, defined by z practice = 1 − 2 −k , when a test is taken for the kth time. 6. For cognitive tests, we also limited our analyses to subjects that had at least two measurements over the course of the study. (This restriction was not applied to other biomarkers.) 7. CSF t-tau and p-tau were transformed to logarithmic scale in the analyses. For robustness, a constraint ensuring nonnegative slopes in the regression model was applied. 8. The model can project the changepoint forward in time as well as backward, sometimes allowing for a changepoint that precedes the initiation of data collection. Although the goal is to identify a changepoint preceding the onset of symptoms, this two-phase model allows for the changepoint and clinical onset of symptoms to coincide. Figure 3, the model organizes the estimated time of symptom onset for the biomarker values along a broken line. The model fits the data so that the subjects with less abnormal values (e.g., higher test scores) tend to be on the left side of the curve and therefore to have a longer estimated time to clinical symptom onset.

Mathematical Description
Let n denote the number of subjects in the study. For subject k, we assume p k observations of a scalar biomarker, denoted y k, 1 , ..., y k, p k , at ages t k, 1 , ..., t k,p k . Let T 1 ,...,T n denote the subjects' ages at the end of the study. Typically: T k > t k,p k (age at last biomarker measurement). Let U k denote the age at MCI onset, which is observed only if U k ≤ T k .
Finally, let z k, 1 , ..., z k,p k denote additional covariates, such as gender, education level, intracranial volume, etc. (Each z k may be a vector.) Let η k denote a constant random effect associated with each subject and ε k,1 , ...ε k, p k a random noise associated with each observation. They are modeled as Gaussian variables with respective variances τ 2 and σ 2 . A prior distribution is used for U k , and modeled as a Gaussian with mean m 1 = 93 years and standard deviation σ 1 = 14.5 years. (This distribution was learned from an independent dataset.) Details on the estimation procedure leading to these values can be found in Tang et al. (2017). Importantly, this distribution represents a clinical onset time applicable to the whole population (including people who will not get AD during their lifetime). Onset times for the diseased population (i.e., conditional to onset prior to death) would be significantly smaller.
The changepoint model is is the changepoint, the largest of years before onset or 20 years. This is a two-phase regression model. The biomarker first follows a linear trajectory (phase I) for t k, j < S k and then switches (with a continuous transition) to the model (phase II) The null hypothesis model assumes the phase I model over all times, or equivalently that c = 0.
The changepoint parameter is estimated using posterior means defined as follows. For each fixed , the model parameters are estimated by maximum likelihood, and the value of the log-likelihood ( ) is computed. The estimator for is then defined by where the sum is over a finite number of between 0 and 100, and 0 is the log-likelihood for the null hypothesis of no changepoint.

Validation
P-values and confidence intervals are estimated using bootstrap techniques. The bootstrap method estimates standard errors based on random resampling of the data with replacement; it can be a more reliable method of calculating standard errors and statistical significance than parametric methods (Efron, 1979). For p-values, a general model is fitted, residuals are estimated, then resampled to reconstruct a model satisfying the null hypothesis (hence with c = 0). For confidence intervals, the approach is similar, but the full estimated model is used for reconstruction. We used 1,000 bootstrap samples for each estimation. A median absolute deviation was calculated for each changepoint in order to provide a robust estimate of the standard deviation, estimated as the median of the absolute value of the difference from the median of the sample as a whole.
Additionally, a "precedence graph" was developed using the variables for which significant changepoints were calculated The red lines are the observed data for the subjects who remained cognitively normal. The green lines represent individuals who progressed to cognitive impairment. Dark red stars (and dark green stars, respectively) are the model predictions for the same subjects for whom observed data are presented. The blue vertical line marks the estimated changepoint. The black vertical line marks the estimated onset of clinical symptoms. The age of onset for the subjects who remained cognitively normal was imputed via Bayesian prediction. Note that the x-axis values for cognitively normal subjects are based on an estimated clinical onset time (since the "true one" is right-censored), using the posterior mean of its distribution given the observed data. This explains the gap that can be observed in some graphs between actual and censored observations, since the latter lacks the statistical variability around the estimated posterior mean.
(n = 8). Each of the measures were compared with one another using a bootstrap technique to determine the fraction of bootstrap samples for which the changepoint estimates for one measure were found to be earlier than the other. More precisely, precedence between two modalities A and B, with estimated changepoints A and B , is assessed by computing the probability P( A ≤ B ), this probability being itself estimated using bootstrap resampling (i.e., for the sampling distribution). Because this probability requires to sample from the joint distribution of A and B , bootstrap samples are generated consistently across modalities, and the corresponding normalized frequency are computed over 1,000 replicas. This means that if, in order to reconstitute a bootstrap sample for visit time t in modality A, one has used the original residual computed at visit time t, then the corresponding original residual from the same visit time t in modality B will be used, whenever possible, to reconstitute a bootstrap sample at time t for B. (When these two modalities have not been measured together at the considered visit times, the bootstrap samples are created independently.) Groups of variables were computed using hierarchical clustering, based on precedence probability vectors, in order to provide a more concise representation of the changepoints with respect to one another. Arrows were drawn between measures when the confidence for one changepoint was earlier than the other at least 75% of the time.

Null Hypothesis Model
The model with c = 0, which corresponds to our null hypothesis of no changepoint related to disease onset takes the form y k, j = a + b 1 t k, j + b 2 U k + β T z k, j + η k + ε k, j .
Importantly, it includes a disease effect, through the use of the onset time as a partially observed covariate. While our primary focus here is on changepoint, there is certainly an interest in testing the significance of the hypothesisb 2 = 0, with respect to the "double-null" model y k, j = a + b 1 t k, j + β T z k, j + η k + ε k, j , which, this time, includes no disease related effect. A significant value of b 2 , say, with b 2 > 0, implies a lower value of the biomarker for earlier cognitive onsets.

Accounting for a Normal Changepoint
The changepoint in the proposed model is specified in terms of "time before disease onset" and does not include the possibility of such a change being due to normal aging. One of the difficulties in trying to account for both effects (let us call them disease vs. normal changepoint) is that if one of them is strong enough and not corrected for, it may induce significance when testing for the other effect even if that one is not present. On the other hand, correcting for an effect that is not present may reduce the power for detecting the other effect, even if the latter is present. For clarity and to simplify the exposition, we have focused our model and results on a single changepoint measured against disease onset. To be complete, however, we also explored a model in which a correction for a normal changepoint is included (which will therefore be more conservative for the detection of a change associated with disease). This model includes one additional covariate taking the form max(t kj -δ, 0) where δ is a subject-independent age measuring the normal changepoint. To simplify the estimation process, this time δ is computed first (using maximum likelihood for a model without disease changepoint, which in this case only includes random effects as hidden variables), and then plugged into the general model.

RESULTS
Like most statistical results, significant tests reflect a possible association between two factors and any further interpretation (including, in particular, conclusions about cause and effect) can only be expressed as plausible hypotheses, consistent with the results, with other evidence and maybe prior beliefs. For our model, significant results provide a credible indication that a change of regime in the biomarker occurs some number of years before clinical onset. One of the possible interpretations is indeed that the changepoint marks an effect of the disease, which happens before its onset can be detected. Another, however, is that the change is non-pathological, but that its timing is correlated with the disease onset. Statistics alone cannot determine which one is more likely to reflect reality.
The results of the changepoint analyses for each of the nine variables examined are summarized in Table 2. As can be seen, all of the variables had a significant changepoint. The changepoints varied widely across the years preceding symptom onset. Figures 3, 4 provide graphical representations of the model predictions.
The earliest changepoint is for CSF t-tau, that is estimated at approximately 34 years. Significantly later in time, we estimate changepoints for two cognitive markers: The Logical Memory Delayed Recall (15.4 years prior to symptom onset), and the Digit Symbol Substitution Test (14.6 years prior to symptom onset). They are followed a couple of years later by the other two cognitive measurements: Boston Naming Test (13.2 years prior to clinical symptom onset) and Paired Associates Immediate Recall (11.3 years prior to symptom onset), with CSF p-tau in between (13.0 years prior to clinical symptom onset) and CSF abeta a little later (9.6 years prior to symptom onset). Imaging markers come next, with a 6-year difference between the changepoints estimated on the right (8.8 years prior to clinical symptom onset) and on the left (2.8 years prior to clinical symptom onset) medial temporal lobe volumes. This arrangement is summarized in Figure 5 showing the precedence graph between these variables, in which arrows are placed only when the changepoint order could be estimated with enough reliability, as measured via bootstrap resampling.
All markers except CSF t-tau were significant for rejecting the double-null hypothesis of no effect of the cognitive onset time on the marker, with p-values given by 0.047 (left MTL), 0.004 (right MTL), 0.016 (CSF abeta), 0.007 (CSF p-tau) and less than 0.001 for all cognitive markers. We found b 2 > 0 for all markers (indicating a smaller value of the marker for earlier onsets), except for CSF p-tau, for which this value was negative.
Variables that were significant for normal changepoints (p-value less than 0.05) at fixed age in the considered biomarkers for CSF p-tau (age: 51.3), CSF t-tau (age: 64.2 years), Digit Symbol Substitution (age: 74.8), Logical Memory Delayed (age: 67) and right medial temporal lobe (age: 48.6 years). Introducing this normal changepoint in the model as an additional covariate had limited impact on the significance and value of the disease changepoint times.

DISCUSSION
The changepoint analyses presented here lead to several conclusions. First, the changepoint for CSF t-tau occurs several decades prior to the onset of clinical symptoms. Second, the changepoints seen in the rest of the variables appear to reflect a cascade of events in which multiple measures are changing a decade prior to the onset of clinical symptoms. Third, there is a significant difference in the vulnerability of the right vs. the left medial temporal lobe. Several of these findings diverge from the hypothesized ordering of biomarkers in the model proposed by Jack et al. (2013), as well as the hypothetical stages proposed in the NIA/AA Working Group Report, both of which propose that cognitive change follows significant accumulation of amyloid and tau (Sperling et al., 2011). Our results describe a more complex ordering, in which some cognitive effects were found to predate changepoints in CSF abeta. It is of course possible that changes in amyloid occur at a time too early to be detectable in our model, or with a different slope associated with the disease, which is not addressed here.
To gain further insights into these findings, we looked in greater detail at the two cognitive tests with very early timepoints. Figure 3A (which shows the model regression on t-tau after removal of covariate and random effects) indicates that, after a first phase during which t-tau is flat, the protein appears to accumulate starting about 34 years before onset. This is a large gap, but, as already remarked, these results do not inform us on the pathological nature of this increase, but rather on the fact that an event/changepoint seems to happen for CSF t-tau accumulation with a timing that can be associated with clinical impairment several decades later. In other terms, while this changepoint appears to be associated with the onset of disease, it does not necessarily correspond to an early effect. These findings also highlight the differential relationship between CSF t-tau and p-tau during the evolution of AD, although p-tau and t-tau tend to be highly correlated. This difference is emphasized in the recent AD biomarker "framework" (Jack et al., 2018), which argued that p-tau is more closely related to the pathophysiology of AD, with CSF p-tau levels correlating with neurofibrillary tangle pathology in AD patients. By comparison, elevations in t-tau are also seen in other diseases and are reflective of more general levels of neurodegeneration.
The difference in the changepoint for the right and left medial temporal has been presaged by prior reports that have examined the individual regions within the MTL separately. For example, we previously reported that both the right entorhinal cortex and amygdala, when measured at baseline, were significantly related to time to onset of symptoms, whereas measures on the left were not (Soldan et al., 2015). Further studies are needed to determine why this differential vulnerability may occur.
It is important to acknowledge the limitations of these analyses. First, the wide confidence intervals for the CSF assays, particularly for CSF t-tau and p-tau, limit the ability to narrow down the changepoint for this important biomarker. The variability of CSF assays has been an acknowledged challenge in the field for some time, reflected by international efforts to develop improved methods (Mattsson et al., 2011). Newer assays are currently under development (Chang et al., 2017) raising the possibility that measures with less variability will be soon available that will permit more accurate changepoint estimates, with narrower confidence intervals. Second, while the sample size used here is sufficient to generate findings FIGURE 5 | A precedence graph representing the order of the changepoints among the variables with significant changepoints. An arrow between groups of variables indicates that, more than 75% of the time, the changepoint for the variable represented as the 'source' was found to be earlier than the changepoint for the variable represented as the 'target,' using bootstrap samples. The groupings of the variables were computed using hierarchical clustering within each modality, based on the precedence probability vectors. Arrows that can be inferred by transitivity are not shown for clarity.
with substantial statistical significance, the width of most of the 75% confidence intervals is between 5 and 10 years, some of them being even greater. Under the assumption that an increase in sample size may reduce the confidence intervals, we have established a consortium of five sites around the world that are collecting comparable data (Gross et al., 2017). We plan to apply this changepoint model to data gathered from across the sites, which will greatly increase the sample size. Third, the model itself incorporates assumptions that may limit its applicability. For example, a two-phase linear model assumes some continuity of the biomarkers before and after changepoint, since it only accounts for a change of slope. A very abrupt change, for example, would be imperfectly approximated by the model and may result in a loss of power in the likelihood ratio test. Sublinear or hyperlinear evolutions before or after changepoints may have a similar effect. There could also be more than one changepoint, which is not handled by the analysis thus far. Additionally, the estimates of the changepoint are generally more stable when the likelihood ratio test p-value is small, and for small changepoints. Lastly, the changepoint analysis presented here is for univariate biomarkers, and therefore it has been applied separately to each of the variables. While this approach can, in theory, be extended to the multivariate case, such an extension presents statistical challenges, which are currently under investigation.
As these findings emphasize, identifying biomarker changepoints during the preclinical phase of AD remains challenging. Extrapolating the implications of changepoints to predictive models that might identify individuals likely to progress to AD in later life is yet another step beyond the estimation of changepoints. The efforts underway to develop improved treatments for AD offer the hope that when accurate prediction on an individual basis is possible, effective therapeutic interventions will be available.

DATA AVAILABILITY
Publicly available datasets were analyzed in this study. This data can be found here: http://www.biocard-se.org/.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of Johns Hopkins Medicine Institutional Review Boards with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Johns Hopkins Medicine Institutional Review Boards.

AUTHOR CONTRIBUTIONS
LY conducted the analyses and drafted the manuscript. AM played a major role in acquiring the data. MA interpreted the data and revised the manuscript. AS revised the manuscript for intellectual content. CP revised the manuscript for intellectual content. MM interpreted the data and revised the manuscript.

FUNDING
This study was supported in part by grants from the National Institutes of Health (U19-AG03365 and P50-AG005146).