Characterization of Brain Iron Deposition Pattern and Its Association With Genetic Risk Factor in Alzheimer’s Disease Using Susceptibility-Weighted Imaging

The presence of iron is an important factor for normal brain functions, whereas excessive deposition of iron may impair normal cognitive function in the brain and lead to Alzheimer’s disease (AD). MRI has been widely applied to characterize brain structural and functional changes caused by AD. However, the effectiveness of using susceptibility-weighted imaging (SWI) for the analysis of brain iron deposition is still unclear, especially within the context of early AD diagnosis. Thus, in this study, we aim to explore the relationship between brain iron deposition measured by SWI with the progression of AD using various feature selection and classification methods. The proposed model was evaluated on a 69-subject SWI imaging dataset consisting of 24 AD patients, 21 mild cognitive impairment patients, and 24 normal controls. The identified AD progression-related regions were then compared with the regions reported from previous genetic association studies, and we observed considerable overlap between these two. Further, we have identified a new potential AD-related gene (MEF2C) closely related to the interaction between iron deposition and AD progression in the brain.


INTRODUCTION
Alzheimer's disease (AD) is among the leading causes of death in the United States and has been on the rising trend in the past decade with the aging populations (Alzheimer's Association, 2011). More than 47 million people worldwide are estimated to have AD and related dementias. This number is expected to reach 152 million by 2050, with one new case of dementia diagnosed every 3 s (Patterson, 2018). As no effective treatment has been found to delay the onset and progression of AD (Selkoe, 2012), early diagnosis of AD and understanding of the progression from mild cognitive impairment (MCI) to AD is essential for preventative and therapeutic strategies (Gauthier et al., 2006).
Progression of AD can lead to structural and functional changes in the brain, which various imaging techniques can capture. Differential brain structural diagnostic markers derived from T1-weighted magnetic resonance imaging (MRI) have been reported for AD (Cuingnet et al., 2011), MCI (Driscoll et al., 2009), and MCI-AD conversion (Davatzikos et al., 2011) based on brain atrophy measurement (Jack et al., 2004) and its spatial pattern (Davatzikos et al., 2008). Diffusion MRI can measure white matter connectivity and microstructural integrity. It may also be supportive for the diagnosis of AD, based on both change in white matter tracts (Douaud et al., 2011) and global/local fractional anisotropy (FA) (Medina et al., 2006;Zhang et al., 2007). Functional magnetic resonance imaging (fMRI) has also been explored to characterize cognitive and behavior changes caused by AD progression . Previous studies observed that disruption of restingstate functional networks could differentiate MCI/AD with normal controls (Rombouts et al., 2005); so does the decreased activation in cognition-related brain regions measured by memory encoding task (Machulda et al., 2003). Other functional imaging techniques, such as electroencephalography (EEG) and magnetoencephalography (MEG), have been demonstrated to detect the brain signal spectrum shift (Fernández et al., 2006) and coherence (Jeong, 2004). Previous studies also reported the utility of these techniques in modeling brain network alterations in MCI patients after cognitive training (Xu et al., 2020). Besides, PET imaging has been established as a standard approach to investigate pathological features and imaging biomarkers for AD, including neuritic plaques of amyloidβ peptide fibrils (Nordberg, 2004), hyper-phosphorylated tau neurofibrillary tangles (Ossenkoppele et al., 2016), as well as their respective propagation patterns (Sepulcre et al., 2018;Guo et al., 2019). Recently, the fusion of multiple imaging modalities for the early diagnosis of MCI and AD has been well studied and demonstrated improved performance over single-modality biomarkers (Zhang et al., 2011).
Among all potential imaging biomarkers for AD, one crucial marker is the excessive iron deposition in the brain. In vitro and in vivo studies have observed that excessive iron deposition in the brain might promote neurotoxicity, which causes neuronal injury and has been recognized as a putative factor in AD pathogenesis (Stankiewicz et al., 2007). Previous literature on the iron content measured with MR susceptibility-weighted imaging (SWI) (Halefoglu and Yousem, 2018) has reported significant iron deposition in brain regions related to brain cognitive and memory functions in AD, including substantia nigra, globus pallidum, hippocampus, putamen, and caudate nucleus. It has also been found that iron can induce the production and accumulation of amyloid-β plaques and bind to tau protein to induce tau protein phosphorylation aggregation (Liu et al., 2018). A meta-analysis on 1,813 AD patients and 2,401 normal controls concluded that specific brain regions had statistically significantly higher iron concentrations that can be related to AD (Tao et al., 2014). Besides, genetic factors have been found to play an essential role in the development of neurodegenerative disease in the context of iron deposition. It has been reported that the circulation of iron in the brain involves a complex interaction between metabolic and genetic processes (Rouault, 2013). It has been widely reported that genetic mutations can cause excessive iron deposition at the systemic level, posing as risk factors for several diseases such as acute myocardial infarction (Roest et al., 1999;Tuomainen et al., 1999). Similar gene mutations have also been reported to be related to neurodegenerative diseases (Hagemeier et al., 2012), for example, through the oxidative stress process that ultimately leads to the formation of neurofibrillary tangles senile plaques in AD (Crichton et al., 2011).
However, the predictive value of iron deposition to AD progression, especially with the current advancement of machine learning methods, is still unclear and largely understudied. With the aging of the brain, the excessive iron deposition could be related to many factors besides AD, such as increased permeability of the blood-brain barrier, dilation of blood vessels, redistribution of iron, and iron homeostasis changes (Ward et al., 2014). In the initial stage of AD patients, increasing iron is always along with β-amyloid peptide gathering, which provides the theoretical basis of MRI-based diagnosis (Ward et al., 2014). In addition, while region-specific iron deposition in the normal aging population has been investigated and observed in substantia nigra, putamen, globus pallidum, and caudate nuclei, iron-related pathogenic mechanisms are needed to explain the cause of such selectivity (Zecca et al., 2004). The regional heterogeneity and age-related brain iron have been confirmed by MRI (Zecca et al., 2004;Ramos et al., 2014).
Susceptibility-weighted imaging plays a vital role in the estimation of iron deposition (Sheelakumari et al., 2016). Thus, it could be used to detect abnormal iron deposition related to the progress of AD. Most diagnostic works using SWI imaging technology are currently based on manual or semi-manual measurement of the region of interest (ROI) in MRI images, which relies on previous knowledge and is usually confined to the hippocampus and entorhinal cortex (Zhang et al., 2015). In the past decade, with the advancement of machine learning methodologies, various computer-assisted models have been developed for the early diagnosis of AD, including SVM-based classifier (Davatzikos et al., 2011), Random Forest (Lebedev et al., 2014), and most recently, deep learning methods (Ortiz et al., 2016). However, there is little work on using machine learning methods to analyze iron deposition in the brain and facilitate automatic AD/MCI detection by SWI.
Thus, in this work, we used feature selection techniques and classification algorithms to analyze SWI image data acquired from a group of 69 subjects. After image pre-processing, iron deposition characteristics were extracted from SWI images based on different brain atlases. We then investigated the prediction power of different feature selection methods/atlas/classifier combinations. The best set of brain regions selected by the feature selection procedure were analyzed and compared to neuroscientific findings. We also obtained human gene expression data from the public Allen Human Brain Atlas Microarray dataset to investigate the relationship between iron deposition, genetic risk factor, and AD progression. This integrated analysis framework could lead us to answer the questions of (1) whether iron deposition as characterized by SWI images can be used to differentiate the three groups of subjects (healthy control, MCI, and AD), (2) which brain regions are involved in the differentiation, and (3) whether those brain regions have common genetic factors.
The organization of the rest of this paper is as follows: in the Materials and Methods section, we will introduce the SWI imaging dataset, its pre-processing pipeline, as well as the feature selection and classification methods used in this work. The Result section will showcase and discuss the classification performance and discriminative brain regions identified by the feature selection method. The identified brain regions are then combined with gene expression data to analyze their underlying relationship and discover new ADrelated genes.

Study Population and Image Acquisition
All participants were recruited to establish a registry at the Dementia Care and Research Center, Peking University Institute of Mental Health. The clinical diagnosis of AD was made according to the International Classification of Disease, 10th Revision (ICD-10) (World Health Organization, 2004) and the criteria for probable AD of the National Institute of Neurological and Communicative Disorders and the Stroke/Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA) (McKhann et al., 1984). The clinical diagnosis of MCI was made according to Petersen's MCI criteria with the MMSE score of no less than 24. All the healthy controls had no history of neurological or psychiatric disorders, and subjective cognitive complaints or objectively abnormal cognitive assessment. Other inclusion criteria were as follows: age ≥55 years, right-handed, and primary school education (≥6 years). Exclusion criteria were as follows: current or previous neuropsychiatric diseases, such as Parkinson's disease, epilepsy, alcohol or substance abuse/dependence, and head injury with loss of consciousness that could affect cognition or psychiatric behavior. This study was approved by the ethics committee of Peking University Institute of Mental Health (Sixth Hospital), Beijing, China. All participants were fully informed regarding the study protocol and provided written informed consent.
Participants were scanned on a 3-Tesla MR system (Siemens Magnetom Trio, A Tim System, Germany) using a standard 8-channel head coil at Peking University Third Hospital.
Similar to our previous genetic data studies on Allen Mouse Brain Atlas (AMBA) dataset (Li et al., 2017a,b), in this work, we used the Human Brain Atlas Microarray 1 data (Shen et al., 2012) to search the anatomical brain regions associated with a specific gene. The microarray data provided by Human Brain Atlas Microarray includes the expression value (normalized Z-score) on each brain region from each subject (donor), where we would average these Z-scores across different subjects. In case there were multiple probes used to detect the same gene, Z-scores across different probes would also be averaged. After calculating the averaged Z-scores of the target gene, we applied a threshold of 0.3 to determine whether that gene is considered highly expressed in each brain region. Finally, we would obtain a list of "highly expressed" brain regions associated with it for each target gene, which could be further matched to the regions in SWI data.

Image Preprocessing
Susceptibility-weighted imaging data were normalized into the MNI space based on transformation parameters derived from aligning T1 images to the MNI standard template using diffeomorphic anatomical registration through the exponentiated lie algebra (DARTEL) method (Ashburner, 2007) using Statistical Parametric Mapping (SPM12: 2 ), then resampled to 1.5-mm isotropic voxels followed by spatially smoothing with a 6mm full width at half maximum Gaussian kernel. We then applied three types of brain atlas: AAL (anatomical automatic labeling, 116 regions with 90 cerebral cortex and 26 cerebellar cortex regions), Harvard-Oxford (48 regions), and MMP (matrix metalloproteinase, 180 regions) to extract iron deposition information in the corresponding brain regions from SWI images. Specifically, each brain region (defined by one type of atlas) in the registered SWI images were characterized by the collection of voxels: where T is the number of subjects in the dataset (69 in this study), S consists of the R number of regions in the atlas, S = 1, . . . , S 1 , S 1 + 1, . . . , S 2 , . . . , S R . S k denotes the number of voxels in the k-th region. In this way, we can obtain the phase value (i.e., iron content) of the i-th subject in the k-th region by The final iron content vector for the i-th subject in the k-th region is then where value in X varies from −π to π, and b varies from −4,096 to 4,095.

Feature Selection and Classification
To identify the most discriminative brain regions toward classification of AD, MCI, and NC, we explored the commonly used supervised feature selection methods of Lasso and Adaptive Lasso for the analysis.
(1) Lasso: for dataset D = x 1 , y 1 , x 2 , y 2 , . . . , (x m , y m ), where x ∈ R d , y ∈ R, in this work x denotes the iron content vector and y denotes the patient label (AD/MCI/NC), we consider the simple linear regression model with the squared error as a loss function: When there are much more features than samples, the above equation is prone to be overfitting. To solve the problem, the regularization term is introduced. With l−1 norm regularization, the Lasso (Least Absolute Shrinkage and Selection Operator) algorithm is with the regularization parameter λ >0.
(2) Adaptive Lasso: by adding weights to the penalty term in the original Lasso, the Adaptive Lasso can counteract the possible biased estimate in LASSO, with the following loss function: where w j = β j (OLS). Both Lasso and Adaptive Lasso can identify a subset of all regions (termed "regional feature") in a given atlas by the nonzero weights found in regression. Iron content information from the selected regional features were then used to train various classifiers, including AdaBoost, LinearSVC, Randomtree, and XGBoost, to perform a three-class (healthy control/MCI/AD) classification. We set the basic classifier of AdaBoost algorithm to cycle 100 times with a learning rate of 0.1. The depth of Randomtree is four. For the XGBoost algorithm, we used the tree model with maximum depth of five and softmax as activation function.

RESULTS
Results of this study are organized into two parts: in the first part we will describe the performance of the models we used for classifying AD and MCI patients from healthy controls. More importantly, we will analyze the region-specific feature extracted for making the classification. In the second part, we will connect the identified regions with gene expression data, both for the purpose of investigating the validity of the identified regions and to discover potentially new AD-related gene(s). A sample set of SWI images and the three different brain atlases used in this study are visualized in Figure 1.

Classification Performance and the Regional Features Extracted
By using Lasso and adaptive Lasso to extract discriminative features (i.e., brain regions) on SWI images defined on three different types of brain atlas, we now obtained the AD-predictive regions for each feature selection method (Lasso/adaptive Lasso) and each atlas (AAL, Harvard-Oxford, MMP) combination. To investigate the discriminative power of the obtained AD-predictive regions, we designed a 10-fold cross-validation scheme where the dataset would be randomly divided into training/validation set (62 subjects) and testing set (7 subjects), where regional features based on feature selection   method/atlas combination will be extracted correspondingly. Different classification methods would be then trained and tested on the extracted features in this cross-validation experiment. The experiment was repeated for 100 times, the average classification performance with regarding to different feature selection method (Lasso/Adaptive Lasso), atlas (AAD/Harvard/MMP), and classifiers (AdaBoost/LinearSVC/Randomtree /XGBoost) are summarized in Table 1.
As shown in the performance matrices, the best classification accuracy (0.7388) was obtained by the LinearSVC classifier on AAL atlas, using Lasso for the feature selection. Thus, in later analysis, we will investigate the feature regions and the corresponding neuroscience implications based on Lasso feature selection on AAL atlas. In total, 20 AAL regions that were selected as discriminative features by Lasso are listed and visualized as colored brain surfaces in Figure 2B. In addition, a meta-analysis study in Tao et al. (2014) have found that eight brain regions are closely related to AD, including frontal lobe (FL), parietal lobe (PL), temporal lobe, amygdala (Amg), putamen, cingulate cortex, globus pallidus (GP), and caudate nucleus, which are listed and visualized in Figure 2A.

Analysis of Gene Expression Distribution on Selected Regional Features
The AD-predictive regions identified by our models were then compared with the brain regions associated with three commonly known AD risk factor genes (APOE, MAPT, and CLU). As reported in our previous study, apolipoprotein E (APOE) genotype has been found to account for the majority of AD risk and pathology (Marioni et al., 2017;Sepulcre et al., 2018). Microtubule-associated protein tau (MAPT) gene is related to the encoding of tau protein and can cause potential vulnerability to tau accumulation as found in our work (Sepulcre et al., 2018), leading to frontotemporal dementia-spectrum (FTD-s) disorders (Coppola et al., 2012). The clusterin gene (CLU) has been reported to be associated with degraded regional cerebral blood flow (Thambisetty et al., 2013) and white matter integrity   Normalized Z-scores of each gene are provided in the corresponding top-level structure (column). A higher value indicates gene located in that structure has higher expression. Three genes with the highest presence frequencies (MAPT, CLU, and MEF2C) are highlighted in bold. (Braskie et al., 2011). Regions that are highly expressed by APOE, MAPT, and CLU were identified from Allen Human Brain Atlas Microarray data based on the thresholding of their normalized Z-Scores as previously introduced. The names of these regions, along with the AD-predictive regions identified in this work, are summarized in Table 2. In addition to these three genes, we preselected a total of 21 genes based on the literature reports on aging, dementia, and MCI/AD progression, and obtained their correspondingly highly expressed regions in the Allen Human Brain Atlas. Based on the premise that the 20 AD-predictive regions as identified in this work from SWI data are associated with AD both at imaging and genetic level, we investigated how frequent each of the 21 genes are expressed in these regions, and used the derived gene presence frequency as a measurement for the association between each gene and AD development. However, the brain region definition used in the Allen Human Brain Atlas (shown as "top-level structure name" in the downloaded expression data) is different from the AAL atlas used in this work, where the top-level structures are usually larger and can cover multiple regions in the AAL atlas. Thus, we firstly identified a total of 18 top-level structures from the microarray data with higher expressions for either of these 21 genes (i.e., the union of highly expressed regions), including structures of FL, cingulate gyrus (CgG), hippocampus formation (HiF), parahippocampal gyrus (PHG), PL, Amg, GP, and striatum (Str). As shown in Table 3, each gene (row) has its corresponding normalized Z-score at each top-level structure (column). After that, we mapped these top-level structures with the regions in AAL atlas by comparing their spatial distributions in the MNI space. Based on this many-to-many mapping, we can find which AAL region(s) are included in each top-level structure. Between the 18 top-level structures and the 20 AD-predictive regions, we obtained the following 18 × 1 weight vector [4,1,1,2,2,0,2,0,5,0,1,0,2,2,0,0,0,0,0,0]. Each value in the vector indicates how many of the 20 AD-predictive region(s) are presented in that top-level structure, for example, the first value of four indicates that the top-level structure of FL includes four AD-predictive regions. Finally, we can calculate the presence frequency for each gene by multiplying the normalized Z-score in each top-level structure with the corresponding weight value and adding them together, as listed in the "Frequency" column in Table 3. Higher value of presence frequency indicates that the target gene in overall is more frequently expressed in regions that are predictive to AD, thus could be potentially more associated with AD.

DISCUSSION AND CONCLUSION
In this study of analyzing the association between the development AD and iron deposition as characterized by SWI images, we used Lasso family algorithms for supervised dimensionality reduction to identify important regions that are discriminative to AD to overcome the challenge of small sample size and large feature number. We then applied different classification methods to investigate the diagnostic capability of SWI towards MCI and AD. Ten-fold cross-validation experiment results show that >70% accuracy can be achieved for this three-class classification task. Further investigation into the identified AD-related regions revealed that they are consistent with previous literature reports. The regions identified in this work cover all the eight regions previously reported.
We then co-analyzed the SWI-derived imaging features with the genetic data provided by Allen's brain atlas. We found that the regions identified by the feature selection method are identical with the regions rich in gene expression associated with protein precipitation and the blood-brain barrier, as measured by the microarray data (Shen et al., 2012). Specifically, our study has found that: (1) AD-predicative regions identified in this work cover most of the APOE-associated regions except for the dorsal thalamus and striatum. Iron is involved in the formation of astrocytes that might affect the permeability of the blood-brain barrier. It has been reported that AD patients have a breakdown of the blood-brain barrier before dementia, neurodegenerative diseases, and brain atrophy. APOE gene has been found to be the strongest AD risk gene involved in the damage of the blood-brain barrier (Montagne et al., 2015). On the other hand, the UCLA team (Raven et al., 2013) used FDRI to detect cerebral iron and studied the difference of iron content in the hippocampus and thalamus regions. As detected in our study, the iron levels increase at the hippocampus, not the thalamus, might be linked to an injury to the hippocampus. (2) Our identified regions also include CLU/MAPT-associated regions except for the striatum, as well as the occipital lobe, which is commonly known as non-specific to AD. In the initial lesion regions of AD patients, increased iron concentration was associated with the accumulation of Aβ (amyloid β) and tau protein. Our previous study in Sepulcre et al. (2018) shows that CLU and MAPT genes are responsible for the high expression of Aβ and tau protein, respectively. Studies have shown that iron deposition was detected in microglia and astrocytes in the amygdala, and ferritin concentrations increase with age (Zecca et al., 2004). A study of 143 healthy individuals shows that iron deposition in the caudate nucleus increases with age, peaking at age 60 (Wang et al., 2012). Other literature reported similar iron increases with age in the putamen, globus pallidus, and caudate nucleus (Ward et al., 2014). Further studies on iron deposition in AD, MCI, and NC also revealed significant differences in the caudate nucleus and putamen . (3) From the gene expression frequency analysis, our study observed that the top three genes presented in the identified AD-predictive regions are CLU, MEF2C, and MAPT. Besides the two previously reported AD-related genes (CLU/MAPT), the MEF2C gene plays a key role in the development of multiple types of tissues. It is currently known to be related to epilepsy, autism, and mental retardation (Rashid et al., 2014). However, its role in the adult brain is largely understudied. Recent evidence suggests that the MEF2C gene regulates memory forming structures (Cole et al., 2012), which implies its potential role in the memory degradation of AD patients.
There are several limitations of the current study, both on the method design and the data used. Specifically, the current feature selection and classification scheme are relatively simple due to the limited sample size. With a larger dataset, we can try more advanced data analytics methods such as deep learning to better map imaging features and disease development. We also recognized that the regional features identified in the current study are limited because the AAL atlas is relatively coarse for the detailed spatial analysis. In later studies, we will try more fine-grained parcellation of the brain or performing voxellevel analysis.
Our conclusions on the effectiveness of using SWI for AD diagnosis need to be validated by external datasets. Nevertheless, in this study, we have only collected susceptibility-weighted images as one single dataset. We have implemented the complete feature selection and classification pipeline into an integrated framework. We will publish the code onto a public repository so that external researchers can use the same regional features to test their prediction power and compare the classification performance. The possible role of the MEF2C gene also needs to be validated, both by testing the consistency of its expression in the MCI/AD population on another dataset other than the Allen brain atlas and by exploring the biological pathway of the MEF2C's expression using bioinformatics tools.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Board of Peking University Sixth Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
PY was responsible for the model implementation, experiment, and manuscript writing. XL was responsible for the experiment design, image processing, and manuscript writing. ZW was responsible for the data curation and image processing of this work. HW, BD, and QL were responsible for the funding acquisition, project administration, and supervision of this work. All authors contributed to the article and approved the submitted version.