Federated Morphometry Feature Selection for Hippocampal Morphometry Associated Beta-Amyloid and Tau Pathology

Amyloid-β (Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. One of the particular neurodegenerative regions is the hippocampus to which the influence of Aβ/tau on has been one of the research focuses in the AD pathophysiological progress. This work proposes a novel framework, Federated Morphometry Feature Selection (FMFS) model, to examine subtle aspects of hippocampal morphometry that are associated with Aβ/tau burden in the brain, measured using positron emission tomography (PET). FMFS is comprised of hippocampal surface-based feature calculation, patch-based feature selection, federated group LASSO regression, federated screening rule-based stability selection, and region of interest (ROI) identification. FMFS was tested on two Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohorts to understand hippocampal alterations that relate to Aβ/tau depositions. Each cohort included pairs of MRI and PET for AD, mild cognitive impairment (MCI), and cognitively unimpaired (CU) subjects. Experimental results demonstrated that FMFS achieves an 89× speedup compared to other published state-of-the-art methods under five independent hypothetical institutions. In addition, the subiculum and cornu ammonis 1 (CA1 subfield) were identified as hippocampal subregions where atrophy is strongly associated with abnormal Aβ/tau. As potential biomarkers for Aβ/tau pathology, the features from the identified ROIs had greater power for predicting cognitive assessment and for survival analysis than five other imaging biomarkers. All the results indicate that FMFS is an efficient and effective tool to reveal associations between Aβ/tau burden and hippocampal morphometry.


INTRODUCTION
Alzheimer's disease (AD) is now viewed as a gradual process that begins many years before the onset of detectable clinical symptoms. Measuring brain biomarkers and intervening at preclinical AD stages are believed to improve the probability of therapeutic success (Brookmeyer et al., 2007;Sperling et al., 2011;Jack et al., 2016). Amyloid-β (Aβ) plaques and tau tangles are two specific protein pathological hallmarks of AD and are believed to induce neurodegeneration and structural brain atrophy consequentially observable from volumetric magnetic resonance imaging (MRI) scans (Jack et al., 2008;Selkoe and Hardy, 2016;Gordon et al., 2019;La Joie et al., 2020). Brain Aβ and tau pathology can be measured using positron emission tomography (PET) with amyloid/tau-sensitive radiotracers or by using lumbar puncture to measure these proteins in samples cerebrospinal fluid (CSF). Even so, these invasive and expensive measurements are less attractive to subjects in the preclinical stage, and PET scanning is also not as widely available as MRI.
In the A/T/N system -a recently proposed research framework for understanding the biology of AD -the presence of abnormal levels of Aβ (A in A/T/N) in the brain or CSF is used to define the presence of biological AD (Jack et al., 2016). An imbalance between production and clearance of Aβ occurs early in AD and is typically followed by the accumulation of tau (T in A/T/N) protein tangles (another key pathological hallmark of AD) and neurodegeneration (N in A/T/N) detectable on brain MRI scans (Hardy and Selkoe, 2002;Sperling et al., 2011;Jack et al., 2016). Therefore, there has been great interest in developing techniques to associate Aβ and tau deposition with MRI measures (Tosun et al., 2013(Tosun et al., , 2014(Tosun et al., , 2016(Tosun et al., , 2021Ten Kate et al., 2018;Petrone et al., 2019;Ansart et al., 2020;Ezzati et al., 2020;Sun et al., 2020;Dahl et al., 2021). In the structural MRI, the hippocampus is a primary target region across the spectrum of dementia research from clinically normal to late stages of AD (Shi et al., 2011;Li B. et al., 2016;Dong et al., 2019;Cullen et al., 2020). Cognitively unimpaired (CU) individuals with abnormally high Aβ burden have faster progression of hippocampal volume atrophy (Insel et al., 2017;Zhang et al., 2020). Additionally, tau burden in the brain, assessed using PET tracers, also strongly correlates with subsequent hippocampal volume atrophy (La Joie et al., 2020).
However, the influence of Aβ/tau pathology on regional hippocampal atrophy in AD is still not fully understood. A study by Hanko et al. (2019) examined correlations between 3D hippocampal shape measures and Aβ/tau burden in 42 subjects and reported a significant association between tau burden and atrophy in specific hippocampal subregions [cornu ammonis 1 (CA1) and the subiculum], but detected no Aβ-associated hippocampal regions of interest (ROIs). Our previous studies observed an association between hippocampal morphometry and Aβ burden on 1,101 subjects (Wu et al., 2018 and found significant Aβ-associated hippocampal subregions in the CA1 subfield and the subiculum (Wu et al., 2020). Overall, studies of hippocampal ROIs in larger cohorts tend to be more highly powered and reliable.
Integrating data from multi-sites is common practice for large sample sizes and increased statistical power. An important direction of interest in multi-site neuroimaging research is federated learning -which offers an approach to learn from data spread across multiple sites without having to share the raw data directly or to centralize in any one location. In many cases, different institutions may not be readily able to share biomedical research data due to patient privacy concerns, data restrictions based on patient consent or Institutional Review Board (IRB) regulations, and legal complexities; this can present a major obstacle to pooling large scale datasets to discover robust and reproducible signatures of specific brain disorders. To remedy this distributed problem, a large-scale collaborative network, ENIGMA consortium, was built (Thompson et al., 2020). However, most ENIGMA meta-analytic studies currently focus on univariate measures derived from brain MRI, diffusion tensor imaging (DTI), electroencephalogram (EEG), or other data modalities, and relatively few have studied multivariate imaging measures. Federated learning models, such as decentralized independent component analysis (Baker et al., 2015), sparse regression (Plis et al., 2016), and distributed deep learning (Kaissis et al., 2021;Stripelis et al., 2021;Warnat-Herresthal et al., 2021), have made solid progress with leveraging multivariate image features for statistical inferences, allowing iterative computation on remote datasets. Some other recent studies focus on multivariate linear modeling (Silva et al., 2020), federated gradient averaging (Remedios et al., 2020), and unbalanced data for multi-site (Yeganeh et al., 2020). To our knowledge, these methods have not yet been applied to detect multimodal associations in AD research, such as finding anatomically abnormal regions on MRI that are associated with Aβ/tau pathology defined using PET.
Here we propose a novel framework, Federated Morphometry Feature Selection (FMFS), to detect the association between hippocampal morphometry markers and Aβ/tau burden. FMFS calculates patch-based surface morphometry features from brain MRI scans of people with AD, mild cognitive impairment (MCI), and CU subjects. With our novel federated feature selection method based on group LASSO regression, we apply the proposed framework to assess hippocampal ROIs associated with Aβ/tau burden (note that by ROIs, we mean subregions and advanced morphometric features on the 3D hippocampal surface, which may have a finer scale than currently defined subregions of the hippocampus).
To test the added value of distributed computing, we also hypothesize that the proposed framework could leverage distributed computational models to improve the statistical power to identify the influence of Aβ/tau pathology on regional hippocampal morphometry. To examine the value of subregional hippocampal features as effective biomarkers of AD progression, we train several regression models with the features from the ROIs to predict the cross-sectional Mini-Mental State Exam (MMSE) score (Folstein et al., 1975) -a very widely used clinical measure of disease severity in AD. In addition, we use a separate dataset to demonstrate our ROIs offer superior performance relative to several other univariate measures in a survival analysis of MCI conversion to AD. Our work generalizes and enriches federated learning research by explicitly selecting (and visualizing) key regional features. By increasing access to information from large-scale imaging datasets and computing efficiency, FMFS may offer an efficient and effective screening tool to reveal the associations between Aβ/tau burden and hippocampal morphology across the dementia spectrum, and the features on ROIs could provide a means for screening individuals prior to more invasive Aβ/tau burden assessments that might determine their eligibility for interventional trials.

Subjects
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 1 The ADNI was launched in 2003 as a public-private partnership led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of MCI and early AD. For up-to-date information, see www.adniinfo.org. From the multiple phases of ADNI -ADNI 1, ADNI 2, ADNI GO, and ADNI 3 -we analyzed two sets of scans for the study of Aβ deposition and tau deposition. For the Aβ deposition study, we analyzed a total of 1,127 pairs of images from 1,109 subjects (18 of them have two pairs from different visiting dates), including 1,127 T1-weighted MR images and 1,127 florbetapir PET images. Similarly, we obtained 925 pairs from 688 subjects (191 of them have more than one pair from different visiting dates) of MRI scans and AV1451 PET images for the study of tau deposition.
In addition to each brain MRI scan, we also analyze the corresponding MMSE scores (Folstein et al., 1975). For amyloid PET, we utilize centiloid measures (Navitsky et al., 2018). Operationally, there have been widely accepted efforts to reconcile differences among different amyloid radiotracers using a norming approach called the centiloid scale (Klunk et al., 2015;Rowe et al., 2017). ADNI florbetapir PET data are processed using AVID pipeline (Navitsky et al., 2018), which are converted to the Centiloid scales according to their respective conversion 1 adni.loni.usc.edu equations (Navitsky et al., 2018;Su et al., 2019). For flortaucipir tau-PET -in a similar fashion to Aβ -tau data are reprocessed using a single pipeline consistent with (Sanchez et al., 2021), so that the standardized uptake value ratio (SUVR) from different ADNI study sites can be analyzed together. In this work, we examine two regional SUVR for tau deposition, corresponding to Braak12, and Braak34 (Schöll et al., 2016;Baker et al., 2017a,b;Maass et al., 2017). Table 1 shows the demographic information from the two cohorts that we analyzed.

Proposed Pipeline
In this work, we develop a FMFS model to detect the influence of Aβ and tau deposition on hippocampal shape deformity and to better support the future prediction of AD pathology as shown in Figure 1. In panel (1), each institution first extracts the morphometric features locally. The hippocampal structures are segmented from registered brain MR images and smoothed hippocampal surfaces are further generated. After the surface parameterization and fluid registration, the hippocampal radial distance (RD) and tensor-based morphometry (TBM) features are calculated at each surface point. Each institution selects patches on each hippocampal surface and reshapes the grouped features (RD or TBM on each patch are one group) of each subject to a vector. Next, in panel (2), taking each Aβ/tau measurement as the dependent variable, the institutions perform the federated feature selection model on these patches of features to generate local hippocampal ROIs for each Aβ/tau measurement.

Image Processing
Using the FIRST algorithm from the FMRIB Software Library (FSL), hippocampal structures are segmented in the MNI152 standard space (Patenaude et al., 2011;Paquette et al., 2017; Figure 1B). Surface meshes are constructed based on the hippocampal segmentations with the marching cubes algorithm (Lorensen and Cline, 1987) and a topology-preserving level set method (Han et al., 2003). Then, to reduce the noise from MR image scanning and to overcome partial volume effects, surface smoothing is applied consistently to all surfaces. Our surface smoothing process consists of mesh simplification using progressive meshes (Hoppe, 1996) and mesh refinement by the Loop subdivision surface method (Loop, 1987; Figure 1C). Similar procedures adopted in a number of our prior studies (Wang et al., 2010(Wang et al., , 2012Colom et al., 2013;Luders et al., 2013;  (1) shows the steps for each institution to extract morphometric features locally. The hippocampal structures are segmented from registered brain MR images and smoothed hippocampal surfaces are then generated (A-C). After the surface parameterization and fluid registration, the hippocampal radial distance (RD) and tensor-based morphometry (TBM) features are calculated at each surface point (D). Each institution selects patches on each hippocampal surface and reshapes the grouped features of each subject into a vector (E,F). Next, in panel (2) (G,H), taking Aβ/tau measurements as dependent variables, the institutions perform the federated feature selection model on these patches of features to generate hippocampal local regions of interest (ROIs) for each Aβ/tau measurement. Monje et al., 2013;Shi et al., 2013aShi et al., ,b, 2015 show that the smoothed meshes are accurate approximations to the original surfaces, with a higher signal-to-noise ratio (SNR).
Using the holomorphic flow segmentation method (Wang et al., 2007), each hippocampal surface is parameterized with refined triangular meshes, and the parameterized surfaces are then registered to a standard rectangular grid template using a surface fluid registration algorithm (Shi et al., 2013a).
After parameterization and registration, we establish a one-toone correspondence map between hippocampal surfaces. Each surface has the same number of vertices (150 × 100). As illustrated in Figure 1D, the intersection of the red curve and the blue curve is a surface vertex, and at each vertex, we adopt two kinds of morphometry features, the RD (Pizer et al., 1999;Thompson et al., 2004) and measures derived from surface TBM (Davatzikos, 1996;Thompson et al., 2000;Woods, 2003;Chung et al., 2008). The RD (a scalar at each vertex) represents the thickness of the shape at each vertex relative to the medial axis; this primarily reflects surface differences along the surface normal directions. The medial axis is determined by the geometric center of the isoparametric curve on the computed conformal grid . The axis is perpendicular to the isoparametric curve, so the thickness can be easily calculated as the Euclidean distance between the core and the vertex on the curve. TBM examines the Jacobian matrix J of the deformation map that registers the surface to a template surface (Shi et al., 2013a). For TBM, det (J) was computed at each vertex, and this value reflects how the surface area changed around the vertex (expansion or atrophy). Additionally, we used the heat kernel smoothing algorithm (Chung et al., 2005;Shi et al., 2015) to refine the surface features. Since the surface of the hippocampi in each brain hemisphere has 15, 000 vertices and each vertex has one RD and one TBM, the final feature dimensionality of both hippocampi combined, for each subject, is 60,000 [(15, 000 + 15, 000) × 2].
Finally, on each hippocampal surface (100 × 150 vertices), we uniformly selected 2, 500 patches of size 2 × 3, and RD and TBM in one patch were considered as a group of features, respectively ( Figure 1E). We selected this patch size of 2 × 3 to increase the robustness of the feature selection model, but also because it does not have an adverse impact on the feature visualization. The grouped features for each subject are reshaped to a vector ( Figure 1F) and will be further processed with our FMFS model.

Federated Group LASSO Regression
Group LASSO (Yuan and Lin, 2006) is a widely used technique for group-wise feature selection in high dimensional data. A group-LASSO linear regression has the following optimizing problem: where X g ∈ R N × p g is the feature matrix, and y denotes the N dimensional response vector. Group LASSO divides the original feature matrix X ∈ R N × p into G different groups, where X g represents the features in gth group and w g is the weight for this group. After solving the group LASSO problem, we get the G solution vectors, β 1 , β 2 , ..., β G . The dimensionality of each group, p g , can be arbitrary and the whole solution vector β is [β 1 , β 2 , ..., β G ] ∈ R p . Additionally, λ is a positive regularization parameter to control the sparsity of the solution vector, and w g is the weight for gth group of features. There are many optimization methods to solve the group LASSO problem; block coordinate descent (BCD) (Qin et al., 2013) is one of the most efficient. Instead of updating all the variables at the same time, BCD only updates one or several blocks of variables at each epoch. Therefore, for the group LASSO problem, it can optimize one group of variables while keeping the other ones fixed. Based on this idea, we proposed a federated block coordinate descent (FBCD) to solve our problem. Li Q. et al. (2016) proposed an optimization model, the local query model (LQM), which preserves the data privacy at each institution. We assume that there are I institutions, and each of them owns a private data set (X i , y i ). We can reformulate the problem (1) as where is the least square loss of the ith institution. We then have the global gradient, Each of the local institutions calculates its own gradient locally and uploads it to the master server. The master server will compute the global gradient, ∇f X, y; β , by adding all ∇f i X i , y i ; β . It then assigns the global update gradient ∇f X, y; β back to all the local institutions to compute β. Then, β is updated locally with the shrinkage function at the 6th line of Algorithm 1. Our proposed FBCD method is outlined in Algorithm 1.

ALGORITHM 1 | Federated block coordinate descent (FBCD).
Input: Data pairs from the i institutions X 1 , y 1 , ..., X i , y i , ..., X I , y I with group information and the regularization parameter λ Randomly select gth group to optimize 3: Compute the local gradient of gth group: Compute the global gradient by LQM: ∇f X g =

Federated Screening for Group LASSO
Finding the optimal value for the regularization parameter λ is a common problem in LASSO techniques. The most frequently used methods, such as cross-validation and stability selection, solve it by trying a sequence of regularization parameters, λ 1 > ... > λ κ ; this can be very time-consuming. Instead, the enhanced dual polytope projection rule (EDPP) (Wang et al., 2015) achieved a 200× speedup on the crossvalidation in real-world applications, by using information derived from the solution of the previously tried regularization parameter. For the group LASSO problem, the gth group of features X g can be discarded if it satisfies the screening rule, and J g Frontiers in Neuroscience | www.frontiersin.org and L g are the elements of J and L defined in Algorithm 2. The screening rule is based on the uniqueness and non-expansiveness of the optimal dual solution, because the feasible set in the dual space is a convex and closed polytope. More information on EDPP may be found at the following GitHub: http://dpcscreening.github.io/glasso.html. Following the screening rule, we further propose a federated screening rule for group LASSO, named federated dual polytope projection for group LASSO (FDPP-GL), to rapidly locate the inactive features in a distributed manner while preserving data privacy at each institution ( Figure 1G). We summarize the method in Algorithm 2. In the algorithm, we estimate the maximum regularization parameter, λ max . The input sequence of parameters, λ 1 , λ 2 , ..., λ κ , should be no greater than λ max . Based on the solutions with the sequence of regularization parameters, we can then perform stability selection (Meinshausen and Bühlmann, 2010) to select significant features that are most related to the corresponding y ( Figure 1H).

Morphometry Feature Selection and Visualization
We carry out the proposed federated group LASSO method to measure how significantly the patches of features are related to the response y. Given a decreasing sequence of regularization parameters, λ 1 , ..., λ κ , we learn a set of corresponding models, β (λ 1 ) , ..., β (λ κ ). We perform stability selection by counting the frequency of non-zero entries in the learned models and visualize the frequency on the surface. The counted frequency on each vertex is normalized to 1-100 and then mapped to a color bar. For better visualization, we smooth the values on each surface with a 2 × 3 averaging filter. The regions with higher frequency values will be assigned warmer colors, as illustrated in the subfigure (h) of Figure 1. In other words, these areas have more significant associations with y.

Performance Evaluation Protocol
To further validate whether these identified hippocampal ROIs are related to Aβ/tau deposition, we used RD and TBM features of these ROIs to predict MMSE scores based on random forest, multilayer perceptron (MLP), and LASSO regression models. Ten-fold cross-validation was adopted to evaluate the performance of the models, and root mean squared error (RMSE) was used for measuring the prediction accuracy. Meanwhile, we also compared the prediction results of ROIrelated features with the results of the whole hippocampal features and Aβ/tau measures. We also tested the computing efficiency with the 1,127 subjects for the study of Aβ. Firstly, we randomly assign the 1,127 subjects to five institutions, of which each has almost the same number of subjects and one computation node. After uniformly selecting 100 regularization parameters from 1.0 to 0.1, we performed stability selection with our proposed framework, FMFS, FBCD (FMFS without the screening rule), as well as the state-of-the-art distributed alternating direction method of multipliers (DADMMs) (Boyd et al., 2011). Besides this, we also repeated the same experiments with different dimensionality of features by randomly down-sampling and up-sampling the original features.

Efficiency Evaluation
A significant innovation of FMFS is that we introduce a screening rule during the group LASSO feature selection stage, which highly improves the computation speed compared to the DADMMs algorithm (Boyd et al., 2011). Moreover, we also compare FMFS with the Gauss-Southwell-Lipschitz rules (GSL) for block coordinate descend in Nutini (2017). Besides, we also tested the running time of FBCD in our federated framework without the screening rule.
We simulated the distributed condition on a cluster with several conventional ×86 nodes, of which each contains two Intel Xeon E5-2680 v4 CPUs running at 2.40 GHz. Each parallel computing node has a full-speed Omni-Path connection to every other node in its partition. A total of 1,127 subjects for the Aβ deposition study were randomly assigned to five simulated institutions, each of which has almost the same number of subjects and one computation node. We uniformly selected 100 regularization parameters from 1.0 to 0.1 and ran all three methods with the same experimental set-up. Under different morphometry feature resolutions (where we randomly downsampled or up-sampled the dimension of the features), our FMFS method achieved a speedup of 62-, 80-, 86-, and 89-fold, compared to DADMM as shown in Figure 2. For FBCD, our FMFS has a speedup of 54-, 72-, 80-, and 86-fold. For GSL, our model has a speedup of 12-, 15-, 15-, and 17-fold.

Amyloid-β and Tau Associated Hippocampal Morphometry
We employed stability selection with our FMFS model to select the ROIs (subregions of the hippocampal surfaces) related to Aβ and tau. We respectively standardize the two types of input features, RD and TBM, for each subject, using Z-scores, and adopt the centiloid value as the measure of Aβ burden and Braak12 and Braak34 measures for tau deposition. Since the regularization parameters can control the sparsity of the solution vector and further influence the area of the ROIs, we uniformly generated 100 regularization parameters between 0.01 and 0.001, which may lead to a reasonable size for the selected ROIs. After training the model, we got 100 solution vectors, of which the dimensionality is 60,000, since each of the left and right hippocampal surfaces contains 15,000 vertices, and each vertex has two features. Then, we performed stability selection by counting the non-zero entries for RD and TBM on the same vertex. The counted frequency on each vertex was normalized to 0-100 and then mapped to a color bar, as shown in Figures 3-5. For better visualization, we smoothed the values on each surface by a 2 × 3 averaging filter. The warmer color areas have a higher frequency value and have stronger associations with the responses, i.e., brain global Aβ and tau burden.
In this experiment, we first ran the proposed model on the cohorts for Aβ and tau deposition. As illustrated in the top left picture of Figures 3-5, the morphometric abnormalities mainly happen in specific hippocampal subregions, namely the subiculum and CA1. Additionally, we separately studied the ROIs for groups of CU, MCI, and AD subjects. As shown in the rest of the three panels in Figures 3-5, the morphometric associations are strongest in the subiculum and CA1 at the early stages; but with the progression of AD, the distortions are more focal in subiculum. Specifically, the results for CU subjects are shown in the top right panel of each figure, where the warmer colored regions are widespread in both the subiculum and CA1 areas. However, in the results for the AD group, the warmer colored regions mainly focus on the area of the subiculum.

Association Analysis Between Features on Regions of Interest and Measure for Amyloid-β and Tau Deposition
In this experiment, we try to demonstrate the morphometric features of our selected ROIs have stronger correlations with the measures for Aβ and tau deposition than the other hippocampal surface features. After performing stability selection, we were able to rank the vertices related to each measurement of Aβ/tau deposition. We selected the 3,000 features from the 1,500 topranked vertices for each subject (1,500 RD and 1,500 TBM). For a fair comparison, we also selected 3,000 features from 1,500 randomly selected vertices for each subject and used them as features representing differences across the entire hippocampus. To fit the Pearson Correlation analysis, we converted the features on ROIs to a single value for each subject. First, as the features on the ROIs should have stronger predictive power, we used the frequency on each vertex as a weight to multiply the RD and TBM on the vertex. And then, we, respectively, summed up the weighted RD and weighted TBM on the ROIs for each subject. The value for RD and the value for TBM were further reduced to a scalar with principal components analysis (PCA). PCA is an unsupervised model to reduce the dimensionality of the data while minimizing information loss. It creates new uncorrelated features which maximize the variance successively. For the randomly selected features on the whole hippocampal surface, the RD and TBM were directly summed up without multiplying the frequency and reduced to a single value with PCA. In Figure 6, we illustrate the results of Pearson correlation between morphometric features and measures for Aβ and tau deposition. The top three subfigures illustrate the correlation between the values on our selected ROIs and the measure for Aβ or tau deposition. The bottom three are between the value on the whole hippocampal surface and the measure for Aβ or tau deposition. The correlation coefficients and p-values are shown in each subfigure. The correlation coefficient of Centiloidrelated ROIs is −0.23, and the coefficient for the whole surface is only −0.1. For Braak12 and Braak34, the coefficients of our

Predicting Mini-Mental State Exam Scores Based on Hippocampal Regions of Interest
In the model of Jack et al. (2016), an abnormal level of Aβ and tau deposition tends to occur earlier than abnormal cognitive decline can be detected. In this experiment, we further validated the ROIs selected by our proposed model in terms of their prediction accuracy for MMSE score in cohorts where Aβ and tau deposition were measured separately. After performing stability selection, we were able to rank the vertices related to each measurement of Aβ/tau deposition. We selected the 3,000 features from the 1,500 top-ranked vertices for each subject (1,500 RD and 1,500 TBM). Then, we used these features to predict the MMSE score as described in section "Performance Evaluation Protocol." For a fair comparison, we also selected 3,000 features from 1,500 randomly selected vertices for each subject and used them as features representing differences across the entire hippocampus. Moreover, we also leveraged the measurements for Aβ or tau deposition to predict MMSE. In addition, we compare our FMFS with recursive feature elimination (RFE) (Guyon et al., 2002). The feature dimensionality of our morphometry feature is 60,000 and RFE may take tens of days to rank features for such a big dataset. For equal comparison, we also selected 1,500 RD and 1,500 TBM for each measurement of Aβ/tau deposition. To accelerate the feature selection speed, we randomly select 300 features from the 30,000 RD and use RFE to select the top 15 RD. We repeated the step 100 times and selected 1,500 RD. With the same strategy, we also select 1,500 TBM. Then, we used these features to train machine learning models, including random forest, MLP, and LASSO regression. Ten-fold cross-validation was adopted to evaluate the performance of the models, and RMSE was used for measuring the prediction accuracy. In Table 2, the top five rows indicate the results for Aβ deposition, and the rest of the rows are for different measurements of tau deposition. Hippocampal ROIs represent the features on our selected ROIs and RFEselected represents the features selected by RFE. The RMSEs of our framework are always the smallest. It is worth noting that comparing to the RFE method, our proposed FMFS framework demonstrated significant efficiency improvement. Specifically, the average running time of the RFE method is 49,319 s while ours FMFS method 22 s, roughly with 2,240-fold efficiency improvement. These results demonstrate that the features in the ROIs selected by our model can always have a stronger predictive power and predict the MMSE score better than the measurements of Aβ and tau deposition.
We also perform Pearson correlation between the morphometry features and MMSE and between the measure for Aβ or tau deposition and MMSE. We also utilize the same method as section "Association Analysis Between Features on Regions of Interest and Measure for Amyloid-β and Tau Deposition" to convert the multivariate features to a scalar. The results are shown in Figure 7. The first column is the correlation between the measures for Aβ and tau deposition and MMSE. And the second column is the correlation between the features on our selected ROIs and MMSE. The last column is between the feature on the whole surface and MMSE. The correlation coefficients and p-values are shown in each subfigure. In the study of Aβ deposition, the coefficient for centiloid is −0.36 and the ones for the features on centiloid-related ROIs and the whole surface are 0.3 and 0.11. In the study of tau deposition, the coefficient for Braak12 and Braak34 are −0.58 and −0.59. And the ones for the features on Braak12-related ROIs and Braak12related ROIs are 0.29 and 0.28. The coefficient for the features

Predicting Clinical Decline in Participants With Mild Cognitive Impairment
In this experiment, we evaluated the performance of our features on the ROI in survival analysis by using 118 MCI participants' data from a separate dataset (Wang et al., 2021) from ADNI (Table 3), including 63 MCI converters, who converted to probable AD in the next 6 years, and 55 MCI non-converters. Similar to section "Association Analysis Between Features on Regions of Interest and Measure for Amyloidβ and Tau Deposition, " we also chose 1,500 RD and 1,500 TBM from the four ROIs (Aβ, Braak12, and Braak34) and 3,000 features from 1,500 random-selected vertices on the whole hippocampal surface to predict the conversion rates from MCI to AD, respectively. For comparison, we also performed the same experiment with the surface area and volume of the hippocampus. The hippocampal volume and surface area were calculated with the smoothed hippocampal structures after linearly registered to the MNI imaging space (Patenaude et al., 2011;Shi et al., 2013a), and the sum of the bilateral hippocampal volume and the sum of the bilateral hippocampal surface area for each subject were used for this experiment.
To fit the univariate Cox model, we converted the features on ROIs to a single value for each subject. First, as the features on the ROIs should have stronger predictive power, we used the frequency on each vertex as a weight to multiply the RD and TBM on the vertex. And then, we, respectively, summed up the weighted RD and weighted TBM on the ROIs for each subject. The value for RD and the value for TBM were further reduced to a scalar with PCA. PCA is an unsupervised model to reduce the dimensionality of the data while minimizing information loss. It creates new uncorrelated features which maximize the variance successively. For the randomly selected features on the whole hippocampal surface, the RD and TBM were directly summed up without multiplying the frequency and reduced to a single value with PCA.
Then, the optimal cutoffs for these measurements were determined with the maximum sensitivity and specificity for distinguishing MCI converters and non-converters based on Receiver Operating Characteristic (ROC) analysis (Robin et al., 2011). The ROC curves are illustrated in Figure 8, and the AUC, 95% confidence interval (CI) of AUC, and the optimal cutoffs are shown in Table 4.
With the optimal cutoffs, we could divide the whole cohort into two groups with different measurements. For example, the subjects with hippocampal volume higher than 7814.9 mm −3 were assigned to a high-value (HV) group, and the rest were into a low-value (LV) group. As expected, AD may decrease the hippocampal volume as well as the other measurements. Next, we Values are mean ± standard deviation, where applicable.
fitted a Cox proportional hazard model (Moore, 2008) with the six measurements separately, and the regression beta coefficients (β), the hazard ratios (HRs), and statistical significance (p-values) are shown in Table 4. Moreover, we calculated the survival probabilities for conversion to AD in the HV group and the LV group by fitting Kaplan-Meier curves. The survival probabilities of the subjects based on hippocampal surface area, volume, the whole hippocampal features, and the features on ROIs related to Aβ, Braak12, and Braak34 are shown in Figure 9. Each color represents the survival curve and 95% CI of one group. Here a FIGURE 8 | The ROC analysis results for hippocampal surface area, volume, the whole hippocampal feature, and the features on ROIs associated with Aβ, Braak12, and Braak34. The AUC for each measurement is shown in parentheses.
TABLE 4 | AUC for ROC analysis, optimal cutoffs, and estimated hazards ratios (HRs) for conversion to AD in MCI patients with high-value and low-value biomarkers based on a univariate Cox model.

Measurements
AUC (95% CI) Cutoff β HR (95% CI) p-Value log-rank test was used to compare the survival group differences based on a χ 2 test, and the p-values are illustrated in each plot. A result with a p-value < 0.05 indicates that the two groups are significantly different in terms of survival time. The features from our selected ROIs tended to always yield stronger significant results than the hippocampal surface area, volume, and the whole hippocampal features.

DISCUSSION
This work proposes a novel framework, FMFS, to efficiently detect Aβ/tau associated hippocampal morphometry markers at different clinically defined stages of AD. The first contribution of this work is that our proposed FMFS model shows excellent computational efficiency compared to similar federated learning models, with a speedup of up to 89-fold. Our work may help accelerate large-scale neuroimaging computations over various disparate, remote data sources without requiring the transfer of any individual data to a centralized location. The second contribution is that the FMFS is an effective tool to select and visualize the brain imaging feature data. In our previous studies (Stonnington et al., 2021;Zhang et al., 2021a,b), the morphometry features always showed excellent performance in predicting AD progression. However, the major limitation of these works was that they failed to visualize the disease-related regions on the surfaces. In the current work, our proposed FMFS model can well select the features with stronger predictive power and further visualize the ROIs on the surfaces. The proposed method is general and may be applied to analyze any general brain imaging feature data. Moreover, our experimental results show that morphometric markers from the hippocampal subiculum and CA1 subfield are apparently associated with Aβ/tau markers in all the clinically defined stages of AD and, as AD pathology progress, the ROIs showing associations are more focal. With two prediction experiments, we further demonstrate that the morphometric features on our identified ROIs show a stronger predictive power in predicting MMSE scores and future clinical decline in MCI patients. All the results indicate that FMFS is a useful screening tool to reveal associations between Aβ/tau status and hippocampal morphology across the clinically normal to dementia spectrum. Aβ/tau-associated features on ROIs could be used as potential biomarkers for the Aβ/tau pathology, perhaps as a screening tool prior to using more expensive and invasive PET techniques.

Amyloid-β/Tau Associated Hippocampal Morphometry
Amyloid-β and tau proteinopathies accelerate hippocampal atrophy leading to AD on MRI scans (Maass et al., 2017;Hanko et al., 2019;Wang et al., 2021). However, the influence of Aβ/tau deposition on hippocampal morphology in pathophysiological progression of AD is still not well understood. Some prior works (Shi et al., 2013a;Tsao et al., 2017;Adler et al., 2018) demonstrated that CA1 and the subiculum are the ROIs with the greatest abnormalities in the early stages of the AD pathophysiological process. Besides, the study of Hanko et al. (2019) reported a significant association between tau burden and atrophy in specific hippocampal ROIs (CA1 and the subiculum), but detected no Aβ-associated hippocampal ROIs in the 42 subjects they studied. Our work applies two kinds of morphometry measures (RD and TBM) and the novel FMFS framework to two datasets to study fine-scale morphometric correlates of Aβ and tau deposition. Both results are consistent with the prior studies noted above. Besides, we also studied the influence of Aβ/tau burden on hippocampal morphometry at different stages of AD. As the results show in Figures 3-5, Aβ/tau associated hippocampal ROIs are more focal as AD pathology progresses, especially at the final stage of AD itself.

Predictive Power of the Features on Regions of Interest
To verify the clinical value of these identified ROIs, we compared their prediction performances to global hippocampal morphometry and Aβ/tau measures using three different machine learning models. As shown in  clinical scores, which followed our initial hypothesis. Compared to randomly selected features, the features on ROIs show stronger predictive power, which illustrates the promise of our FMFS model. Additionally, these Aβ/tau-associated features always performed better than the measurements of Aβ/tau, and could be used as a potential biomarker for Aβ/tau pathology, especially as a screening indicator.
In addition, the results in section "Predicting Mini-Mental State Exam Scores Based on Hippocampal Regions of Interest" further proved the stronger predictive power of the ROIs in survival analysis (of conversion from MCI to AD). Here, the univariate biomarker computed from our ROIs had better performance than the traditional hippocampal volume, which suggested the potential ability of our ROIs to study AD as a univariate biomarker. Consequently, both experiments demonstrated the effectiveness of the FMFS model.

Stability Analysis
To test whether the performance of our FMFS model could be affected by kinds of data distribution across institutions, we performed fivefold cross-validation on the dataset for the study of Aβ under three conditions, including a data-centralized condition and data distributed across three institutions and five institutions. We simulated the distributed condition on a cluster with several conventional ×86 nodes, of which each contains two Intel Xeon E5-2680 v4 CPUs running at 2.40 GHz. Each institution is assigned one computing node. For each training data set, we randomly assigned the subjects to three institutions, five institutions, or a single institution. Besides, we also compared the performance of FMFS with DADMM and FBCD under the five-institution conditions. We perform cross-validation a total of 10 times with a sequence of regularization parameters, 1, 0.5, and 0.1, and with all the other experimental set-ups being the same as in the previous experiment. The average RMSE for the prediction of MMSE was employed to evaluate the prediction accuracy during training and testing, as shown in Table 5.
Additionally, we tried to collect datasets from different institutions and studies to validate the stability of our federated model in the real-world condition. Besides ADNI, we also collected MRI scans from other institutions, including 307 cognitively unimpaired subjects from Open Access Series of Imaging Studies (OASIS) (Marcus et al., 2010) and 38 MCI patients from Arizona APOE cohort study (AZ) (Caselli et al., 2009). The datasets for the study of Aβ and tau are treated as two institutions' data. Therefore, in this experiment, we

Limitations and Future Work
Despite the promising results are obtained by applying FMFS, there are two important caveats. First, this work is based on crosssectional data. It would also be valuable to track the longitudinal hippocampal ROIs deformity as Aβ/tau change over time. In the future, we plan to conduct longitudinal association analyses of hippocampal features and their relation to Aβ/tau burden. Second, This work only studied the hippocampal structures, but other structures, such as the ventricles, and cortical surface metrics such as gray matter thickness or volume (Chou et al., 2009;Doherty et al., 2015) are also affected by AD pathology. We hypothesize that our proposed framework will contribute more to these high-dimensional features. Therefore, in the future, we will collect more dataset to explore more Aβ/tauassociated brain regional abnormalities. This future work will help shed new light on the relationship of component biological processes in AD.

CONCLUSION
This work proposes a novel high-dimensional federated feature selection framework, FMFS, to study the Aβ/tau burden associated with abnormalities in hippocampal subregions on two datasets. Experimental results showed that FMFS encoded hippocampal features at different clinical stages that were associated with Aβ/tau burden. As the clinical symptoms worsen, these ROIs appear to be more focal. Our novel proposed framework achieved superior performance in efficiency compared to a similar feature selection method. To the best of our knowledge, this is the first feature selection model to study hippocampal morphometric changes with Aβ/tau burden across the AD spectrum. More importantly, this model can visualize brain structural abnormalities affected by AD proteinopathies. Beyond brain MRI, our framework may also be applied to any other kinds of medical data for feature selection.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: adni.loni.usc.edu.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Alzheimer's Disease Neuroimaging Initiative. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
JW: methodology, investigation, formal analysis, and writingoriginal draft. QD: conceptualization, investigation, formal analysis, and writing -original draft. JZ, YS, TW, and JY: methodology. RC and ER: review and editing. NL, KC, and PT: methodology and review and editing. YW: conceptualization, investigation, supervision, funding acquisition, and writingreview and editing. All authors contributed to the article and approved the submitted version.

FUNDING
Algorithm development and image analysis for this study were partially supported by the National Institute on Aging (RF1AG051710, R21AG065942, U01AG068057, R01AG069453, and P30AG072980), the National Institute of Biomedical