Spatiotemporal Propagation of the Cortical Atrophy: Population and Individual Patterns

Repeated failures in clinical trials for Alzheimer’s disease (AD) have raised a strong interest for the prodromal phase of the disease. A better understanding of the brain alterations during this early phase is crucial to diagnose patients sooner, to estimate an accurate disease stage, and to give a reliable prognosis. According to recent evidence, structural alterations in the brain are likely to be sensitive markers of the disease progression. Neuronal loss translates in specific spatiotemporal patterns of cortical atrophy, starting in the enthorinal cortex and spreading over other cortical regions according to specific propagation pathways. We developed a digital model of the cortical atrophy in the left hemisphere from prodromal to diseased phases, which is built on the temporal alignment and combination of several short-term observation data to reconstruct the long-term history of the disease. The model not only provides a description of the spatiotemporal patterns of cortical atrophy at the group level but also shows the variability of these patterns at the individual level in terms of difference in propagation pathways, speed of propagation, and age at propagation onset. Longitudinal MRI datasets of patients with mild cognitive impairments who converted to AD are used to reconstruct the cortical atrophy propagation across all disease stages. Each observation is considered as a signal spatially distributed on a network, such as the cortical mesh, each cortex location being associated to a node. We consider how the temporal profile of the signal varies across the network nodes. We introduce a statistical mixed-effect model to describe the evolution of the cortex alterations. To ensure a spatiotemporal smooth propagation of the alterations, we introduce a constrain on the propagation signal in the model such that neighboring nodes have similar profiles of the signal changes. Our generative model enables the reconstruction of personalized patterns of the neurodegenerative spread, providing a way to estimate disease progression stages and predict the age at which the disease will be diagnosed. The model shows that, for instance, APOE carriers have a significantly higher pace of cortical atrophy but not earlier atrophy onset.


INTRODUCTION
Neuroimaging studies have shown an alteration of the brain structure during the course of Alzheimer's disease (AD) (1,2). These lesions appear during the prodromal phase of the disease (3-5) whose observation have been limited due to the absence of clinical symptoms and diagnosis. The importance of the structural changes before the clinical symptoms led to hypothetical models (6), which have been later refined thanks to the gathering of multiple scientific evidences. These modifications took the form of a structural change of the brain in particular an important neuronal loss and an atrophy of the brain cortex (7,8). The study of the temporal evolution of the cerebral cortex reveals an atrophy of the gray matter (9). This cortical atrophy presumably relates the traces of the progression of the lesions over the brain surface. A fine-scale modeling of the atrophy propagation is likely to give a wider understanding of the disease evolution, as the structural markers seems reliable to assess the conversion to the AD stage, potentially carrying subtle indicators of the disease progression in early phases.
The spatiotemporal propagation of these alterations encloses two entangled components. On the one hand, the spatial characterization of the lesions over the brain surface at each time, and, on the other hand, a temporal dynamic of these alterations that may differ from one region to another. Characterizing the proper dynamics of these lesions relies on the possibility to reconstruct the whole time-line of AD, at both a spatial and temporal level, out of short-term observations that are not temporally aligned. Another challenging aspect consists in the variability inherent to the individual patterns of atrophy that requires to consistently compare the subject-specific spreads of alterations. Accounting for the interindividual variability in term of lesion propagation should allow to reconstruct individual patterns of propagation, paving the way to possible personalized model of atrophy, that potentially informs on subject-specific age of conversion or disease stage.
Recently, large datasets have opened the opportunity to investigate data-driven models that have refined and validated these hypotheses to some extend, in particular event-based models (10)(11)(12) that considers the propagation as a series of events, allowing to define a sequence of disease stages. They characterize the overall variability of the events ordering at a population level. However, these models are not well suited to relate for the temporal delays of the alterations at a population level, neither to determine individual cortical atrophy. Multimodal observations, including positron emission tomography (PET) scans, magnetic resonance imaging (MRI) and biomarkers, have been gathered within longitudinal databases, i.e., repeated observations of patients during significant periods of time. The underlying intention is to provide multiple individual snapshots of the disease-patients examined during short-term periods-in order to reconstruct the long-term history of the pathology (13,14) at a group and individual level. Moreover, it offers the possibility to describe and interpret the observed data contrary to quantiles or percentiles that require arbitrary reference distributions. A challenging aspect of AD patient comparison is the fact that, even though AD is related to age, the latter is not a good proxy of the disease stage (15)(16)(17) leaving us without any easy way to align all the individual on the same time-line. In Ref. (18), the authors introduced mixed-effect model that consider each individual trajectory as a variation of a mean scenario of evolution, with a time-warp function that is able to realign the subjects on the same time-line (19). It allows to characterize a spatial and temporal variability of propagation in the sense that it defines a group-average trajectory of propagation with the possibility to reconstruct individual observations thanks to personalized parameters. Nevertheless (18), constrain the model to parallel profiles of progression which does not hold when looking at signals that have various dynamics. Moreover, the model does not take into account the spatial correlations between the data whereas (17), which focus on spatiotemporal patterns of progression for images, exhibited that this led, in the case of a non-linear mixed-effects model, to poor estimations of the subject-specific parameters and individual trajectories.
To account for the spatial structure of the signal, networks have been introduced (20, 21), representing the brain areas as the graph nodes. In this paper, the networks correspond to a graph representation of a signal spatially distributed, namely, the cortical thickness mapped on a mesh representation of the cortex. The node values are the cortical thickness values over time on the related brain area. Extracting and projecting patients cortical thickness on the common mesh allows to compare their atrophy on the same atlas to exhibit similar patterns. As we expect the signal propagation to be spatially smooth with a similar temporal profile of change for neighbor nodes, we consider that a subset of the graph nodes act as control nodes. They define an evaluation function such that the signal at each node is an interpolation of the signal at the control nodes, enabling to smooth the high frequencies (22). The proximity between nodes is defined by the distance matrix which informs on the distance between any pair. Moreover, the number of nodes of this vertex-based graph can be tuned based on the desired application, potentially the same as the resolution of the input data, e.g., a voxel for MRI or PET data.
The aim of this paper is to introduce a model of the cortical atrophy propagation during the long-term course of AD thanks to a graph representation of the neuroimaging data. This model is able to personalize the reconstruction of the propagation to individual longitudinal measurements, allowing to describe the stages of the disease, potentially in the future. The model is described as a general framework for any longitudinal data spatially distributed on a common graph and it is instantiated to exhibit the propagation of the cortical atrophy on the left hemisphere of the brain, across nearly 2000 regions, thanks to longitudinal observations of 154 mild cognitive impaired (MCI) patients that were later diagnosed with AD. While exhibiting an average pattern of propagation, this mixed-effects model allows to reconstruct individual observations through time.

Sketch of the Method
Prior to detail our method, we would like to sketch the key ideas and notations of our work to ease and guide the reading. First, we consider I patients; each patient i is observed J i times, his jth visit being at age t ij , and each observation led to an MRI scan as shown on the left hand side of Figure 1. Segmentation of the cortical thickness, out of the neuroimaging observations, are mapped onto a mesh, as presented on the middle part of Figure 1. The last step corresponds to a subsampling process that led to a graph G of K nodes, characterized by a distance matrix D. At each node, the individual observations define a time-series describing the evolution of the signal through time.
In a second time, we assume that, at each node k of the graph G, there exists a function t → γ k (t) that describes a characteristic evolution of the signal at this node, as shown on Figure 2. The timeseries of individual i at node k derives from a continuous function η ik (t), which is assumed to be a spatial and temporal variations of the representative trajectory γ k (t), illustrated on Figure 3. The temporal variation corresponds to the time realignment of individual i on the common time-line. It adjusts the individual dynamics to a mean pace of evolution, thanks to personalized parameters τ i and α i . τ i stands for the individual time-shift to the mean disease onset, allowing an early (τ i < 0) or delayed (τ i < 0) age at diagnosis. The parameter α i integrates the patient-specific possibility to have a faster (α i > 1) or slower (α I < 1) pace of atrophy compared to the mean scenario of changes. On the other side, the spatial variation corresponds to the adjustment from the mean cortical thickness to individual data. It accounts, for instance, for the difference in size or in spatial thickness distribution at the same disease stage.
We consider that the characteristic signal γ k (t) at node k belongs to a family of curve, here the straight line curves, parametrized by the cortical thickness p k and the rate of atrophy v k . To account for the spatial structure of the signal and the large number of nodes, a subset of nodes, referred to as control nodes, is selected to control the interpolation of the cortical and atrophy values over all the nodes. The distribution of the control nodes depends on the size of the kernel bandwidth such that the kernels densities map almost uniformly the feature space. The model introduces population parameters, that allow to define a characteristic spatiotemporal trajectory of the atrophy, and individual parameters, that not only enable to reconstruct individual trajectories but also permit the statistical study of the distribution of spatiotemporal atrophy patterns. These parameters are estimated thanks to the Monte-Carlo Markov-Chain Stochastic Approximation Expectation-Maximization (MCMC-SAEM) algorithm, which handles non-linear mixed-effects models, with theoretical guarantees and consistent results in practice.

Subjects and Data Preprocessing
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). For up-to-date information, see www. adni-info.org.
The Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset contains longitudinal MRI data for patients that are, at each visit, either cognitively normal (CN), mild cognitive impaired (MCI) patients, or, AD subjects. We selected all the subjects that presented a monotonous decline from MCI to AD, called the MCI converters, removing those that may convert from AD back to MCI or CN. Although AD patients get through an MCI phase, we could not keep CN to MCI patients as they might just as well convert to another dementia. Also, the patients that underwent from CN to MCI and then to AD are not numerous enough to give robust estimation of early stages (CN to MCI). Thus, we kept only the MCI to AD visits of such patients. Altogether, the paper focuses on 154 MCI patients that represents 787 visits, each individual being examined 5 times on average, from 2 to 7 times.
Each visit led to a T1-weighted MRI acquisition, as shown on the left side of Figure 1. The longitudinal pipeline of FreeSurfer (23) was used to extract the cortical thickness of the left hemisphere of the brain which was then projected on a common atlas, namely, FSAverage (24), which is a three-dimensional mesh composed of 163,842 nodes for each hemisphere represented on the central part of Figure 1. This common fixed-graph allows to compare the cortical thickness between visits or patients, node to node. FIGURE 2 | Mesh of the cortical surface where each node embeds a time-series of observations (red points). At node k, the function γ k (t), which can be parametrized by a velocity and two different sets (p 1 , t 1 ) or (p 2 , t 2 ), estimates the cortical thickness over time. The data acquisition and interindividual alignment led to a considerable noise, especially in terms of variability in the measures for close nodes. To smooth this noise and to reduce the computational time, we subsampled the initial graph into a new graph of 1,827 nodes. To do so, we selected 1,827 nodes uniformly distributed over the whole FSAverage graph; the other nodes were then associated to one of the 1,827 nodes thanks to a geodesic distance d on the graph (i.e., the length of the shortest path on the surface mesh between the nodes) using the Fast Marching Algorithm on the mesh (25). Therefore, it constitutes collection of nodes referred to as patches. The value of each node of the subsampled graph is the average value over the corresponding patch, each being constituted of approximately 89 initial nodes of the FSAverage graph. The resolution of this vertex-based approach is lower than the initial one, shown on the right hand side of Figure 1, but still holds the brain topology while smoothing part of the acquisition noise. In our case, each observation can be considered as a vector of size 1,827 where the kth coordinate is related to the kth node of the common fixed-graph G. The latter is also described by the distance matrix D between the 1,827 nodes. It was obtained using the geodesic distance d between the 1,827 nodes on the initial graph FSAverage, whose edges are weighted by a physical length. Finally, for all i, j ∈ {1, . . . , 1,827}, we set D ij = d(x i , x j ) where x i and x j are two nodes of the graph.
In the following, we will present a data-driven model which allow to track the propagation of any signal spatially distributed, supposedly the cortical thickness. We consider a longitudinal dataset y = (y i,j ) 1 ≤ i ≤ I, 1 ≤ j ≤ Ji of I individuals, each patient i being observed J i times during the study at ages (t ij ) 1 ≤ j ≤ Ji . We suppose that there exist a common fixed-graph G defined by a set V = (x 1 , . . . , x K ) of K nodes and a distance matrix D which accounts for the distance between the nodes. Any node x k ∈ R 3 corresponds to a coordinate of a point in space. Each observation y ij = = (y ij1 , . . . , y ijK ) ∈ R k corresponds to the measured signal spatially distributed over the N nodes of G, represented by a point in the multivariate space R k , schematically represented on Figure 3A for K = 3, as if there were only 3 vertices in the mesh. Therefore, it defines a network whose nodes are valued with the signal of interest. It follows that the collection (y ij ) 1 ≤ j ≤ Ji of the observations of a particular subject defines a network that embeds a time-series on each node of G, indexed by the patient age at each observation (t ij ) 1 ≤ j ≤ Ji .

From Short-Term Data to Long-Term History
We assume there that the repeated observations of a subject are sampled from a continuous function where ϵ ijk ∼ N(0, σ 2 ) corresponds to the model noise, whose variance is σ 2 . The function t → η ik (t) describes the evolution of the timeseries at node k for the individual i. Thus, the vector function t → η i (t) = (η i1 (t), . . . , η iN (t)) describes the continuous evolution on the graph for a particular individual, i.e., the spatiotemporal propagation of the signal over the whole brain. It corresponds to a spatiotemporal trajectory in the space of measurements. The trajectory t → η i (t) is therefore able to reconstruct the existing observations (y ij ) 1 ≤ j ≤ Ji , defined at the related time-points (t ij ) 1 ≤ j ≤ Ji , as shown on Figure 3B, but also generate an observation at any time t, potentially in the future.
The repeated data of each individual is a particular window in the long-term course of the disease that potentially overlaps with other patients. We aim to realign along a common time-line these short-term sequences by carefully analyzing the spatiotemporal patterns within each short-term snapshot. Nevertheless, to do so, we also need to account for the interindividual variability in cortical thickness measurements and trajectories of propagation across the network. The interindividual variability prevents us from considering any individual propagation as a good representation of the disease evolution.
Consequently, we assume that there exists a mean scenario of propagation, defined by a group-average spatiotemporal trajectory t → γ 0 (t), represented on Figure 3C, such that each individual trajectory t → η i (t) is a temporal and spatial variation of this mean scenario of changes, detailed in section 2.3.2. This typical scenario of change describes the mean pattern of spatiotemporal propagation of the signal and writes γ 0 (t) = (γ 1 (t), . . .,γ K (t)) where for all k ∈ {1,. . ., K}, t → γ k (t) characterize the typical temporal evolution of the cortical thickness on the brain region related to the node k. As represented in Figure 2, each node has a different temporal profile of atrophy, accounting for the variation of the cortical thickness over time.

Individual Estimation
Translating the generic framework introduced by Schiratti et al. (18) into this case requires to exhibit individual parameters that characterize the individual spatial and temporal variations to the mean, namely the space shifting and the time reparametrization.

Time Reparametrization
We introduce a time-warp function ψ i (t) that corresponds to a time reparametrization that adjust the individual dynamics on a common time-line, which here is the average spatiotemporal trajectory γ 0 . For any patient i with observations ( where t 0 is a common reference time of the reparametrization, α i encodes for the individual pace of propagation and t 0 + τ i describes subjectspecific time-shift to the mean disease onset. As such, if the acceleration factor α i is greater than 1, it corresponds to a faster pace of cortical atrophy whereas α i < 1 indicates a slower pace of atrophy. In the same way, the larger the value of the timeshift τ i is, the later the disease occurs. Therefore, it leads to write η i (t) = γ 0 (ψ i (t)) + ε ij . It adjusts the pace at which the trajectory is followed for the ith individual.

Space Shifting
In the space of measurements R K , we consider individual observations and the mean trajectory γ 0 (t) as shown in Figure 3A.
In order to account for the spatial variability of the individual trajectories, we assume that there exists, for any individual i, a vector w i ∈ R K called the space shift, that characterizes the spatial variations from γ 0 (t) to the observations as shown in Figure 3B. For any point on γ 0 (t), γ 0 (t) + w i is assumed to be on the individual trajectory. Therefore, it is possible to translate all the points (γ 0 (t)) t ϵ R to (γ 0 + w i ) t ϵ R as shown in Figure 3C. This collection defines the individual trajectory η i (t). This space shift must be orthogonal to the trajectory as it ensures the identifiability of the model. In fact, if the direction w i was not orthogonal to the trajectory, then the projection of w i on the geodesic γ 0 would interfere with the individual time realignment induced by the dynamic parameters (α i ,τ i ).
Using mathematical tools from the Riemannian geometry beyond the scope of this study (26) shows that the kth coordinate of the individual spatiotemporal trajectory writes η ik (t) = γ k ( w ik γ k (t 0 ) + ψ i (t)). As the space shift must be estimated in R K , w i is supposed to be a linear combination of few independent components, in the spirit of independent component analysis (ICA) (27). It leads to consider A a K × N s matrix of N s independent directions, and (s ij ) 1 ≤ i ≤ I, 1 ≤ j ≤ Ns parameters to estimate. s i = (s i1 , . . . , s iN s ) ∈ R N s correspond to parameters of individual i that characterize his spatial variations from the mean spatiotemporal trajectory. The orthogonality condition, mentionned in the previous paragraph, leads to consider a basis (B 1 , . . . , B (K−1)N s ) of matrices, whose columns are orthogonal to the direction of γ 0 (t), and parameters (β l ) 1 ≤ l ≤ (K-1)Ns such that A = ∑ (K−1)N s j=1 β j B j . This procedure allows to reduce the dimension of the parameters to estimate for each w i , from K to the chosen number of sources.
It leads to write:

Curve Parametrization
In this paper, we consider a straight line model such that γ k (t) = v k (t-t k ) + p k , v k accounting for the ratio of atrophy and p k for the thickness value at time t k . A linear decay in cortical atrophy is then represented by a straight line trajectory, parametrized by time, in the K-dimensional space as shown on Figure 3. Note that as shown on Figure 2, it is possible to parametrize the same curve with two distinct sets (p 1 , t 1 ) and (p 2 , t 2 ) preventing from having an identifiable model. We decided to fix the parameter t k among all the nodes such that for all k ∈ {1,. . .,K} t k = t 0 , the time reference used in section 2.3.2.1, without any loss of generality as t → γ k (t) is defined on R. Despite the linear form of each coordinate t → γ k (t), the resulting model is non-linear as it includes among others, multiplication of individual and population parameters. Finally, equation (2) becomes This model therefore defines a distribution of multivariate straight line trajectories that accounts for the distribution of the individual trajectories.

Spatial Smoothness
The model proposed in this paper deals with data that are spatially distributed on a graph G defined by a set of nodes V = (x 1 ,. . ., x K ), where each node embeds a spatial coordinate in R 3 . We expect a smoothly varying profile of atrophy across nodes. The proximity between edges is given by the distance matrix D.
In order to ensure small variations of the signal, we introduce a subset V c = (x d 1 , . . . , x d Nc ) ⊂ V whose vertices are called control nodes. Instead of estimating (p k ) 1 ≤ k ≤ K (resp. (v k ) 1 ≤ k ≤ K ) at all the nodes, we consider only the parameters at the control nodes for all x ∈ V such that, at the control nodes, the function is equal to the parameters: ∀k ∈ {1,. . .,Nc}, . At the other nodes, the function is an interpolation of the parameter value at the control nodes weighted by the distance to each of them. Therefore, the control vertices control the evaluation of the parameters among all the nodes.
We choose a Gaussian kernel K b as interpolation splines: where d is the geodesic distance on the mesh and b is the kernel bandwidth. This interpolation allows to remove the possible high frequencies, smoothing the signal spatially. Therefore, it leads to write: The parameters . Given these interpolations, equation (3) writes Even though the distance computed for the cortical thickness corresponds to a distance on the brain cortex, it is possible to compute a connectivity distance based on the connectome, or even an appropriate combination of some of these distances. The challenging part is to put into correspondence areas defined by the connectivity matrices and other networks such as the FSAverage Atlas.
The choice of the set V c of control nodes among the whole set of nodes V is mostly determined by the choice of the bandwidth b: their uniform distribution is such that there is an approximate distance b between them. In the case of the cortical thickness, we have chosen a bandwidth equal to 16 mm which is representative of the spatial variability of the signal.

Algorithm
Equation (5) describes a mixed-effects model, introducing population and individual parameter in this high-dimensional non-linear model. We consider that ((α i ) 1≤i≤I , (τ ) 1≤i≤i , (s ij ) 1≤i≤pI,1≤j≤N s ) are random-effects of the model, leading to write ∀ i ∈ {1,. . ., I} ∀j ∈ {1,. . ., N s }: , and, α i corresponds to the realization of a log-normal distribution so that it is always positive, preventing the individuals to present an increasing cortical thickness over time. Moreover, the Laplacian distribution of s ij arises from theoretical considerations as we need the model to be identifiable, i.e., the solution of the problem to be unique. Finally, these random-effects account for the statistical distribution of the individual trajectories. In the following, we consider z = ((α i ) 1 ≤ i ≤ I , (τ ) 1 ≤ i ≤ I , (s ij ) 1 ≤ i ≤ I, 1 ≤ i ≤ Ns ) as hidden variables.
Given equation (5) and the observations y, we would like to estimate the parameters θ = (t 0 , (p d k ) 1≤k≤N c , (v d k ) 1≤k≤N c , (β k ) 1≤k≤N s (K−1) , σ τ , σ ξ , σ) as a maximum likelihood estimate (MLE) θ * = argmax p(y|θ). The natural way to perform such estimation in mixed-effects models is the Expectation-Maximization algorithm (28). Unfortunately, the E-step is intractable and it is not possible to sample according to the conditional distribution ALGORITHM 1 | Estimation of the general and individual cortical thickness decrease with the MCMC-SAEM algorithm.
Input: Longitudinal dataset y = (y i,j ) i, j of measurement maps, with the corresponding ages (t i,j ) i, j . Initial parameters θ 0 and latent variables z 0 . Geometrically decreasing sequence of step-sizes ρ k . Sufficient statistics S k Initialization: set k = 0 and S 0 = S(z 0 ). repeat Simulation: foreach block of latent variables ) .
p(z|y, θ). Therefore, we use a stochastic version of the EM algorithm coupled with a Monte-Carlo Markov-Chain method, namely, the Monte-Carlo Markov-Chain Stochastic Approximation Expectation-Maximization (MCMC-SAEM) algorithm that is able to deal with non-linear equations in a high-dimensional setting. The algorithm is proven to convergence (29) if the model belongs to the exponential family. In our case, it corresponds to consider that . This leads to consider z = ((ξ i ) 1≤i≤I , (τ i ) 1≤i≤I , (s i ) 1≤i≤I , 1) ) as the extended hidden variables and θ = (t 0 , p, v, (β k ) 1≤k≤N s (K−1) , σ ξ , σ τ , σ p , σ v , σ) as the parameters of the model. The latter introduces sufficient statistics S of the model that are functions of the observations y and latent variables z. The aim of such functions is to disentangle the maximization of the parameters θ and the simulation of the latent variables z.
The pseudo-code of the algorithm, reproduced in Algorithm 1, shows the different steps of the optimization until convergence. For further information about the steps of the algorithm, the reader is referred to Ref. (29)(30)(31) and references therein.

Simulation Study
Since we introduce a new approach to deal with longitudinal data spatially distributed, we performed a simulation procedure to show both the legitimacy of the model used, and the effectiveness of the estimation procedure. To this end, we define a graph represented on the top left of Figure 4, representing a square mesh of 7 nodes per edge, thus 49 nodes in total. Among them, 9 equally distributed nodes represent the control nodes, in red on the figure. As we simulate data according to equation (3), we choose position and velocities across the node of the graph, as shown on the top right part of Figure 4. Then we simulated realizations (ξ i ,τ i , (s ij ) 1 ≤ j≤ Ns ) 1 ≤ i ≤ N for 350 patients, from 4 to 12 visits each (2,980 visits in total) such that it represents 350 longitudinal trajectories of biomarkers spatially distributed. These  data were used to find the parameters used to simulate them. Thus, we have performed 10 runs of the estimation procedure. In order to account for the stochasticity of the algorithm and the motion of the Markov Chains, the results in Table 1 are given with their standard deviation over 10 runs. As we need an initial value for the parameters, we initialized the algorithm without specific knowledge about the positions and velocities, contrary to the experience on the cortical atrophy, so it might reflect a worst-case scenario. Table 1 shows how well the algorithm performs on either control nodes or random nodes, as well as for the individual parameters. The bottom part of Figure 4 shows some results of the estimation procedure. On the left hand side, we provide an example of the stochastic estimations of a parameter over the iterations of the algorithm -the figure shows 10 independent runs. The right hand side presents the final estimation of the velocities across the node of the graph, showing that the model is likely to reproduce the real signal. Overall, these results confirm that such procedure seems reasonable to assess the validity of the model and of the estimation procedure in order to estimate the temporal profile of longitudinal data spatially distributed, such as the cortical atrophy.

Initialization
We evaluated the propagation of the cortical atrophy thanks to cortical thickness values of 154 MCI converters (787 observations) distributed on a graph with 1,827 nodes.
The initialization of the MCMC-SAEM algorithm requires initial values of the parameters θ and realizations z. We would like to draw attention on the realizations ( and ((p d k ) 1≤k≤N c , (v d k ) 1≤k≤N c , t 0 ). The former are chosen equal to 0, leading to initial individual trajectories that are equal to the mean spatiotemporal trajectories. The pattern of atrophy is the same for everyone at the beginning. The latter variables, ((p d k ) 1≤k≤N c , (v d k ) 1≤k≤N c , t 0 ), are initialized based on the raw data. Besides t 0 that is chosen as the mean age of the input observations, for each control node k, we computed linear regressions on the longitudinal thickness values of each patient. Then we average the regression coefficients, each corresponding to a given subject, such that we end up with one rate of atrophy v k per patch. Also, p k was chosen as the average thickness on a given patch. Figure 5 shows the map of the initial v k distributed over the cortical surface which looks reasonable. FIGURE 5 | Annual rate of atrophy mapped over the brain surface used as initialization of our algorithm. Given one area, the corresponding rate of atrophy is obtained as the average regression coefficient of the linear regressions applied to each patient independently. Figure 5 present areas with important cortical decrease over time, such as the temporal lobe and the hippocampus area. On the other hand, the primary visual cortex is less subject to a cortical atrophy. This initialization looks reasonable; however, these linear regressions are not able to reconstruct the individual observations, preventing from a characterization of personalized patterns of atrophy. It avoids describing the temporal and spatial variability of the individual propagations. Moreover, the linear regressions do not take into account the spatial coherence of the propagation as shown by the color-bar on Figure 5 where some areas present an important increase of the cortical thickness. It may be associated to the important noise within the data which is produced by the data acquisition, the extraction of the cortical thickness, and, the alignment on the same atlas.

The initializations of
Thanks to the model we introduced, we were able to reconstruct a mean (resp. individual) spatiotemporal trajectory, detailed in section 3.2 (resp. 3.3), that takes the form of the input measurements, preventing from working with percentiles or clusters that cannot be compared directly to the real observations. Due to the numerous number of hyperparameters and the stochastic behavior of the MCMC-SAEM, the algorithm was computed several times, each run of 100,000 iterations taking approximately 15 h. The runs led to similar results. In the following, the results are presented for the run that provided the best individual reconstruction, i.e., the smaller standard deviation σ of the noise. Its last estimation is of 0.29 mm, where 90% of the input data are between 1.5 and 4 mm.

Population Level
The model exhibits a long-term characteristic pattern of atrophy propagation from early MCI stage to post AD diagnosis. It corresponds to the group-average trajectory described in section 2.3.1 whose spatial (w i ) and temporal (α i and τ i ) variations corresponds to individual spatiotemporal trajectories. It is important to mention that this trajectory is a mean trajectory in a statistical sense, as its parameters are the mean values of the individual parameters. Figure 6 shows the temporal and spatial evolution of the cortical atrophy, from 66 to 78 years old. The brain medial and lateral views shows an important atrophy on the temporal lobe and the medial temporal lobe, especially the fusiform and the parahippocampical gyrus. An important cortical decrease is also discernible on the superior frontal gyrus and at the wider region defined by the inferior parietal lobe and the angular gyrus. On the other side, the prefrontal cortex, the primary visual cortex, the calcaris sulcus, and the post central gyrus are less subject to atrophy.
These results are supported by Figure 7 that shows the map of the annual atrophy v k for the mean spatiotemporal trajectory, distributed over the corresponding brain areas. The areas affected by the cortical atrophy correspond to previous knowledge (32)(33)(34) even tough the different measurements and methodologies lack in consensus. The patterns are still debated in order to find the best characterization of AD compared to normal aging or other neurodegenerative diseases. The proposed model may provide results for different populations on the same atlas, facilitating the comparison between diseases or with normal aging.

Individual Reconstruction
The model is able to characterize personalized patterns of atrophy propagation thanks to a reconstruction of the individual observations. The validation is assessed thanks to the relative error of reconstruction. As mentioned previously, the input data are noisy, at both a temporal and spatial level. As for the temporal part, the 154 patients represent 281.358 temporal profiles (timeseries) over the 1,827 patches, from which only 6.4% present a monotonous profile of decrease. Given all the linear regression computed for the algorithm initialization, the mean (resp. the variance) of the corresponding R-square values is equal to 0.348 (resp. 0.307). On the other side, the spatial noise corresponds to high variation of the signal for neighbor nodes. Given this important noise, the goal of the reconstruction is not to reconstruct  perfectly the data but rather to smooth the propagation over the brain and to capture individual tendencies of atrophy propagation. Thus, the 787 observations involve 1,437,849 reconstruction y ijk , whose relative error of reconstruction is represented on Figure 8A which confirms the hypothesis that the noise is a Gaussian distribution with a zero mean (p-value = 4.24.10 -109 for a t-test comparison with a theoretical distribution of mean equal to zero). As highlighted by Figure 8B, that represents the relative error of reconstruction over the 1,827 patches, the error is mostly randomly distributed over the brain surface. It confirms that the reconstruction error does not have a spatial component as it is uniformly distributed over the brain surface. The color-bar was chosen according to the extreme values: it is important to mention that the larger error of reconstruction corresponds to areas that are close to the corpus callosum where the interpolation relies on a fewer number of control points. Figure 9 presents the reconstruction of two different individuals who present various individual spatiotemporal trajectory, especially space shift norms that are either in the 10% bigger on the left hand side, or in the 10% smaller on the right hand side. The left part of each individual part corresponds to the input data whereas the right part is the corresponding reconstruction done by the model. It shows that the reconstruction is likely to represent the real data. The same color-bar was used as for Figure 6 to compare FIGURE 9 | Real data and data reconstruction for subjects with a small space shift (right) and large space shift (left). The model is able to reconstruct the observed data, with a smoothing component, for subjects that present different characteristics.
the individual data with the characteristic pattern of atrophy. Moreover, the spatiotemporal trajectory η i of individual i is not estimated only at the observed time-points but it is a continuous function of the time, as shown on Figure 3C. Therefore, it is possible to reconstruct the observation at any point, potentially in the future.
One of the properties of the model is to exhibit individual temporal parameters, namely the acceleration factor α i and the time-shit τ i , which allow to reparametrize the individual dynamics on a common time-line. As the data used here correspond to the cortical thickness, the realignment is estimated thanks to structural biomarker dynamics. On the other side, the MCI converters have an age at disease onset, t diag , which corresponds to a clinical status. The latter is not straightforwardly related to the structural dynamics of the individual. In that sense, we decided to realign the age at onset t diag , a clinical biomarker, on the same time-line, assessed with the FIGURE 10 | In red, the histogram of the observed age at diagnosis t diag,i for the 154 MCI converters. In blue, the histogram of the repatametrized age at diagnosis ψ i (t diag,i ) once aligned on the common time-line. This shows that the age at diagnosis is mapped to a smaller range of time-points, in the model of cortical atrophy, suggesting that conversion to AD occurs at a specific stage of cortical atrophy. structural biomarkers. The observed age at diagnosis t diag,i are represented by the red histogram on Figure 10, which is not unimodal and present an important variance. The realignment of the clinical status is represented thanks to the distribution of (ψ i (t diag,i )) 1 ≤ i ≤ I , which is centered with a reduced variance. It suggests that the clinical conversion to AD, determined with t diag corresponds to a specific stage of the cortical atrophy.
As the model estimates individual spatiotemporal trajectories, it allows to describe the variability within the population. The distributions of (α i ) 1 ≤ i ≤ I , (τ ) 1 ≤ i ≤ I and (w i ) 1 ≤ i ≤ I account for the distribution of the individual patterns of atrophy. Furthermore, the ADNI dataset provides, for each patient, multiple features, such as the number of alleles of the APOE-ϵ4 gene, the gender, the marital status, and the educational level. In the case of the APOE-ϵ4 gene, which is known as a genetic risk factor regarding AD (35,36), we exhibited the distribution of (α i ) 1 ≤ i ≤ I and (τ i ) 1 ≤ i ≤ I for the subpopulations defined by the number of alleles of the gene as shown on Figure 11. The more alleles, the more likely to have AD (35,37).
As shown on the left hand side of Figure 11, the patients with two alleles (resp. one allele) present a mean time-shift of −2.98 years (resp. −0.20 years) after the mean scenario, contrary to patient without APOE-ϵ4 alleles that present an average timeshift of 1.89 years, meaning that the more alleles, the earlier the atrophy onset occurs. However, we applied Mann-Whitney two-sided statistical tests that lead to insignificantly differences between the subpopulation. On the other side, same tests were conducted for the same subpopulation with the mean acceleration factor whose distributions are presented on the right hand side of Figure 11. In this case, the group of individual with no alleles presented an average acceleration factor of 0.780, statistically different from the group of individual with one alleles (resp. two alleles) that presented an average acceleration factor of 1.415 (resp. 1.236) with a p-value equal to 0.00104 (resp.0.00511). However, this acceleration factor is not statistically different between the population with one or two alleles (p-value = 0.51518), meaning that these subpopulation have similar rate of atrophy. Additional investigation on the gender, the marital status and the education level did not led to significant differences. It is important to mention that the Mann-Whitney test is sensitive to the number of samples whereas this study focuses on only 154 MCI patients that might lead to insignificant results in some cases, particularly in the case of the educational level (20 categories) or the marital status (unbalanced classes). Finally, it should mentioned that the tests conducted on the individual space shifts w i and the related sources s i did not lead to significant results, mainly because these parameters account for the difference in brain size, and thus thickness, between people.

DISCUSSION
The paper presents a mixed-effects model of the atrophy propagation that is able to characterize a typical pattern of propagation, and, that reconstructs individual observations and scenarios of atrophy. The model exhibits brain areas that are the most affected by the cortical atrophy, such as the parahippocampical gyrus, the temporal lobe, and the superior frontal gyrus. The lesions are less important in the primary visual cortex, the prefrontal cortex, and the primary sensomotory cortex. The model allows to account for the different temporal dynamics of the alterations that can be then compared and ordered.
The proposed model offers a wide versatility of instantiation in terms of profile of temporal variations (exponential decay, sigmoid decay) and spatial variations (resolution, number of control nodes, kernel bandwidth) as it defines a generic framework for the estimation of longitudinal signals spatially distributed. It should be compared to other types of graph-related approaches, such as supervoxels (38) or a vertex-cluster method (39). The latter has exhibited clusters of regression that show profiles of atrophy similar to our results. However, such models do not deal with individual characteristics neither directly with imaging data but rather with normalized values or percentiles, which restrict the interpretation. Further efforts should be concentrated on the validation and improvement of our model, possibly with more complex data and signal propagation.
The individual reconstructions also inform about subjectspecific patterns of atrophy propagation, with potential personalized estimation of the cortical atrophy at future time-points. Further investigations have to be conducted to ensure the quality of the new observations the model is able to generate, so that one can exploit the outcome that the model can predict for an individual some years after his of her last visit. This should be done with a proper validation set to determine the population parameters, and a test-set to predict the individual parameters and thus the future observations. Consistent results might provide information about the structural biomarkers related to the progression of AD, such as in Ref. (40).
Another improvement of the model relies in the distance matrix computation. In this paper, the distance between the nodes is related to the distance on the brain surface, hiding potential effects of the neuronal connections. New distances might be computed based on functional connectivity or combination of different distances, in order to associate the functional and structural components of the brain that are supposed to be complementary in the disease process (41)(42)(43).
The model has the potential to exhibit the spatiotemporal propagation of any signal spatially distributed over a graph. It can be used in order to compare the patterns of propagation in distinct population, e.g., normal aging or any other neurodegenerative diseases. It is also a first step to define personalized patterns that would help for a future prognosis of the patient stages.

AUTHOR CONTRIBUTIONS
All authors listed have made substantial, direct and intellectual contribution to the work, and approved it for publication.