Free Water in White Matter Differentiates MCI and AD From Control Subjects

Recent evidence shows that neuroinflammation plays a role in many neurological diseases including mild cognitive impairment (MCI) and Alzheimer's disease (AD), and that free water (FW) modeling from clinically acquired diffusion MRI (DTI-like acquisitions) can be sensitive to this phenomenon. This FW index measures the fraction of the diffusion signal explained by isotropically unconstrained water, as estimated from a bi-tensor model. In this study, we developed a simple but powerful whole-brain FW measure designed for easy translation to clinical settings and potential use as a priori outcome measure in clinical trials. These simple FW measures use a “safe” white matter (WM) mask without gray matter (GM)/CSF partial volume contamination (WMsafe) near ventricles and sulci. We investigated if FW inside the WMsafe mask, including and excluding areas of white matter damage such as white matter hyperintensities (WMHs) as shown on T2 FLAIR, computed across the whole white matter could be indicative of diagnostic grouping along the AD continuum. After careful quality control, 81 cognitively normal controls (NC), 103 subjects with MCI and 42 with AD were selected from the ADNIGO and ADNI2 databases. We show that MCI and AD have significantly higher FW measures even after removing all partial volume contamination. We also show, for the first time, that when WMHs are removed from the masks, the significant results are maintained, which demonstrates that the FW measures are not just a byproduct of WMHs. Our new and simple FW measures can be used to increase our understanding of the role of inflammation-associated edema in AD and may aid in the differentiation of healthy subjects from MCI and AD patients.

Recent evidence shows that neuroinflammation plays a role in many neurological diseases including mild cognitive impairment (MCI) and Alzheimer's disease (AD), and that free water (FW) modeling from clinically acquired diffusion MRI (DTI-like acquisitions) can be sensitive to this phenomenon. This FW index measures the fraction of the diffusion signal explained by isotropically unconstrained water, as estimated from a bi-tensor model. In this study, we developed a simple but powerful whole-brain FW measure designed for easy translation to clinical settings and potential use as a priori outcome measure in clinical trials. These simple FW measures use a "safe" white matter (WM) mask without gray matter (GM)/CSF partial volume contamination (WM safe ) near ventricles and sulci. We investigated if FW inside the WM safe mask, including and excluding areas of white matter damage such as white matter hyperintensities (WMHs) as shown on T2 FLAIR, computed across the whole white matter could be indicative of diagnostic grouping along the AD continuum. After careful quality control, 81 cognitively normal controls (NC), 103 subjects with MCI and 42 with AD were selected from the ADNIGO and ADNI2 databases. We show that MCI and AD have significantly higher FW measures even after removing all partial volume contamination. We also show, for the first time, that when WMHs are removed from the masks, the significant results are maintained, which demonstrates that the FW measures are not just a byproduct of WMHs. Our new and simple FW measures can be used to increase our understanding of the role of inflammation-associated edema in AD and may aid in the differentiation of healthy subjects from MCI and AD patients.

INTRODUCTION
White matter (WM) atrophy in Alzheimer's disease (AD) was observed more than three decades ago (Brun and Englund, 1986a). The microstructural changes observed in the WM of AD patients include axonal deterioration, Wallerian degeneration, loss of myelin density, loss of oligodendrocytes, microglia activation, and vascular degeneration (Brun and Englund, 1986b;de la Monte, 1989;Brilliant et al., 1995;Englund, 1998;Burns et al., 2005;Sjöbeck et al., 2005). Numerous studies have shown that changes in the WM are an early event in the development of AD, happening in preclinical stages (de la Monte, 1989;Kantarci et al., 2005;Desai et al., 2009). Changes in the microstructure of WM have even been reported before measurable hippocampal atrophy in mild cognitive impairment (MCI) (Zhuang et al., 2013) and preclinical AD (Hoy et al., 2017). More recent evidence shows that chronic neuroinflammation also contributes to the process of neurodegeneration in AD and was recently observed in the WM of AD patients (Raj et al., 2017).
Microglia-induced neuroinflammation in patients has been mostly studied using PET imaging ligands such as [11C]-PK11195 (Zimmer et al., 2014). However, to identify WM changes, diffusion MRI has been the modality of choice (Jones, 2010). Studies in the past decade have identified various regions in the WM where diffusion measures, mostly diffusion tensor imaging (DTI)-based measures such as fractional anisotropy and mean, axial, and radial diffusivities, correlate with symptoms of MCI and AD (Stebbins and Murphy, 2009;Smith et al., 2010;Nowrangi and Rosenberg, 2015;Galluzzi et al., 2016;Mito et al., 2018). A more recent diffusion measure is the free water (FW) index, which measures the fraction of the diffusion signal explained by isotropically unrestricted water (Pasternak et al., 2009), as estimated from a regularized bi-tensor model. In white matter, this measurement represents either FW in extracellular space around axons or FW contamination from cerebrospinal fluid in adjacent voxels. An elevated FW index in white matter has been suggested to indicate neuroinflammation (Pasternak et al., 2012a) and has been described in normal aging (Chad et al., 2018) and many neurological disorders such as schizophrenia (Pasternak et al., 2012b(Pasternak et al., , 2016, Parkinson's disease (Ofori et al., 2015), and AD (Maier-Hein et al., 2015;Ji et al., 2017;Montal et al., 2018).
Association between higher FW, worse scores on a clinical dementia rating (CDR) and higher probability to transition to a more severe CDR stage was recently demonstrated by Maillard et al. (2018). In AD and MCI patients, an association between the widespread increased FW and poorer attention, executive functioning, cognitive performance, visual construction, and motor performance supports the idea that FW metrics are associated with clinical symptoms (Ji et al., 2017;Montal et al., 2018;Reas et al., 2018). In addition, DTI measures that have undergone correction for FW content have been shown to be more sensitive in differentiating between AD patients with and without cerebrovascular involvement compared to standard DTI measures (Ji et al., 2017). In a longitudinal study, FW-corrected radial diffusivity, but not un-corrected radial diffusivity, was higher in the WM of MCI patients who converted to AD compared to MCI patients who did not convert (Maier-Hein et al., 2015). FW-corrected DTI measures also demonstrate greater sensitivity to associations between AD pathology and white matter microstructure compared to standard DTI measures (Hoy et al., 2017).
Based on the growing body of evidence showing the association of FW or FW-corrected metrics with clinical symptoms of AD and concomitant cerebrovascular disease, we set out to develop a single powerful FW measurement that is easily translatable to clinical settings with potential to be used as a priori outcome measure in clinical trials. Simple volumebased measurements such as ventricular expansion, cortical/subcortical gray matter atrophy and WMH volume have been shown to be linked with various AD symptom but none of them gives information on normal appearing white matter, which may be affected earlier during transitional stages of normal aging to MCI and AD.
When measuring FW in aged subjects one needs to take into account white matter lesions that are visible on certain structural MR scans as white matter hyperintensities (WMHs). The number and total volume of WMHs are known to increase with age (de Leeuw et al., 2001) and they have been associated with vascular disease (Debette et al., 2010), cognitive impairment (DeCarli et al., 2001;Yoshita et al., 2006), and even directly with AD (Kandel et al., 2016). Since FW inside WMHs is very high compared to the subtle FW changes specific to the AD continuum, WMHs need to be removed to adequately measure AD specific FW changes. WMHs and FW are known to be part of a WM injury continuum (Maillard et al., 2017) and WM in the WMHs surrounding area (called the penumbrae) is also know to undergo microstructural changes (Maillard et al., 2011(Maillard et al., , 2014. To keep any WMHs related signal out of our FW measurement, a dilated version of the WMHs covering the estimated range of the penumbrae (Maillard et al., 2011) is removed from the final mask. Another pitfall is that due to the generally lower spatial resolution of diffusion images, partial volume contamination from sulci and ventricular CSF can considerably boost FW values leading to incorrect FW measurements. Some studies (Ji et al., 2017) avoid partial volume effects by using Tract-Based Spatial Statistics(TBSS; Smith et al., 2006). This method avoids the partial volume effect by projecting the data on a WM skeleton but has some shortcomings (Bach et al., 2014) that we want to avoid such as the loss of a major part of the WM voxels and atlas registration. In order to get an unbiased and relevant measure of FW in healthy WM, we developed a WM "safe" mask (WM safe ) minimizing GM/CSF partial volume contamination and thus avoiding the shortcomings of TBSS.
In this study, we developed these simple yet powerful wholebrain FW measures without tractography or atlas registration. These measurements can be done on low angular resolution diffusion images and are designed for clinical settings and potential use as a priori outcome measure in clinical trials. This was done by designing a FW processing pipeline that computes whole-brain FW measures inside a partial volume free WM mask (with and without WMHs) for three different groups (cognitively normal, MCI and AD subjects), selected from the ADNIGO and ADNI2 databases. We show that our FW measures were significantly higher in MCI and AD groups compared to NC when using a WM safe mask. We also show, for the first time, that when WMHs and their penumbrae are removed from the mask, the significant results remained, demonstrating that FW measures are not just a byproduct of WMHs. hearing and seeing abilities, no depression or bipolar condition, no history of alcohol or drug abuse and completed at least six grades of education. Also, NCs had no memory impairment and their CDR was 0. MCI subjects included early and late MCI with impaired memory and a CDR of 0.5, while AD subjects met criteria for dementia and had a CDR between 0.5 and 1 (Petersen et al., 2010). Participants did not suffer from any neurological disorders other than MCI and AD such as brain tumor, multiple sclerosis, Parkinson's disease, or traumatic brain injury. The detailed groups demographics can be seen in Table 1.

MRI Data Acquisition
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). For up-to-date information, see www.adni-info.org. Data for each participant came from the ADNIGO and ADNI2 databases 1 . Of the available MRI images, we used the T1w, diffusion weighted imaging (DWI) and fluid attenuation inversion recovery (FLAIR) scans. The DWI scans were acquired 1 http://adni.loni.usc.edu/ along 41 evenly distributed directions using a b-value of 1,000 s/mm 2 with a 1.3×1.3×2.7 mm 3 spatial resolution. The T1w and FLAIR scans were acquired at 1.2×1.05×1.05 and 0.85×0.85×5 mm 3 spatial resolution, respectively. Data was acquired at 58 different North-American locations.

MRI Processing Pipeline
The processing pipeline is illustrated in Figure 1. At first, the T1w and DW images were denoised with a non-local means method robust to Rician noise (Descoteaux et al., 2008), followed by an MRI bias field correction performed with ANTs N4 correction tool (Avants et al., 2009). The brain mask (BM) was then processed and the skull was removed using the BEaST brain extraction software (Eskildsen et al., 2012). We referred to these methods as the preprocessing step in Figure 1. Then, the T1w and FLAIR images were non-linearly registered to the 1x1x1 up-sampled (using linear interpolation) diffusion space with ANTs registration (Avants et al., 2009). Tissue segmentation was then performed on the transformed T1w scan to obtain a binary map of the CSF, GM, and WM. This was done using ANTs Atropos (Avants et al., 2009). In order to prevent any CSF contamination in regions susceptible to partial volume effect, a "safe WM mask" (WM safe ) was built by combining the following morphological operations on the CSF, WM, GM, and brain binary masks: (1) where R n is a 3D structuring element of radius n, ⊕ is the dilatation operator, ⊖ is the erosion operator, and ∩ the intersection operator as illustrated in Figure 1. Using the FLAIR and T1w images, a binary map of WMHs was also computed using volBrain (Manjón and Coupé, 2016). The WMH maps went through visual QA and none of them were rejected or corrected. Binary dilatation of 2 voxels was applied to the WMHs to avoid partial volume effect contamination and at the same time include the WMH penumbrae. The bi-tensor model proposed by Pasternak et al. (2009) was fit onto the DW signal. The result of this fit is a fraction representing the contribution of unconstrained water to the original signal and a new signal representing the tissue contribution. The fraction of unconstrained water contribution in a voxel is what we commonly call FW volume and the 3D image of this FW volume is called the FW map. The tissue signal is the FW-corrected DWI signal, as it represents the signal without its unconstrained component. The safe white matter mask, the WMH mask, and the FW map were then used to extract the mean FW value (µFW) and the relative FW volume (rFW). The rFW is the total volume of FW voxels within the safe white matter mask with FW values greater than 0.1, divided by the total volume of the safe white matter mask. The rFW was created to minimize the impact of ventricle expansion and whole brain atrophy on the final FW in WM measurement. The 0.1 threshold was defined empirically by observing multiple subjects normal appearing white matter compared to the obvious abnormal values. This enables the rFW measurement to discard as much as possible of the background (or noise) FW values.
where m ∈ {WM safe , WMHs, WM safe − WMHs}. All processing was done using a Nextflow (Tommaso et al., 2017) pipeline with all software dependencies bundled in a Singularity container (Kurtzer et al., 2017) ensuring quick and easy reproducibility of the results.

Statistics
A cross-sectional analysis was performed at the first available time point comparing rFW and µFW in NC (n = 81), MCI (n = 103), and AD (n = 42). An analysis of variance (ANOVA) was performed to test for a main effect of diagnostic group followed by a post-hoc pairwise Tukey test to assess differences between sub groups (McDonald, 2006). A log transformation was applied to the rFW and µFW metrics to improve normality of the distribution before analyses.

Quality Assurance
Out of all available subjects in ADNI2 and ADNIGO, 239 had at least one time point with all the images required (T1w, DWI, and FLAIR) to go through the processing pipeline. Visual QA was performed on all images of all time points and those with problems impossible to correct (missing brain parts, acquisition artifacts) were rejected. Gradient information was also QA-ed to make sure every DWI image had 41 evenly distributed direction on one single acquisition shell. This first QA pass eliminated 9 subjects bringing the count of subjects with usable data to 230. Visual inspection was performed on brain extraction of T1w and DWI as well as on the non-linear registration of the FLAIR on the T1w and of the T1w on DWI. Every tissue segmentation mask (WM, GM, CSF) as well as the WMH mask was inspected. This second QA pass eliminated 4 subjects, 3 with artifacts in the DWI images causing improbable values in metrics and one with an obviously incorrect T1 brain mask, leaving 226 subjects with usable data for the group analysis. Table 2 results of the initial ANOVA tests show a significant main effect of group membership across all regions of interests. Post-hoc Tukey tests show that both rFW and µFW are significantly higher in the WM safe mask for MCI and AD subjects than for NC subjects whether or not WMHs were included as seen in Figure 2. Both rFW and µFW in full WM (without partial volume and WMH correction) also differentiate NC from  The statistical significance (in bold) is shown as: *p < 0.05, **p < 0.01, ***p < 0.001.

As shown in
AD and MCI demonstrating that partial volume contaminated measurements can still lead to positive results even though measurements are incorrect. When looking at rFW and µFW specifically within the WMH mask we see some significant between-group differences but with lesser effect and neither of them being able to separate both NC-MCI and NC-AD. Finally, the volume of WMH is significantly higher for AD subjects than for NC subjects, highlighting the need for removing WMH from the WM mask since their volume alone differentiates groups. A supplementary ANCOVA test including age and gender as covariates shows that age is highly associated with rFW (p < 0.001) and gender is marginally associated with rFW (p = 0.012). After accounting for both gender and age, the significant differences between NC and MCI (p < 0.001) as well as between NC and AD (p < 0.001) remain.
To assess the viability of measuring FW in all WM (as opposed to bundles) to differentiate groups we visualize the spatial distribution of free water differences between groups. Every T1W image already registered in diffusion space was non-linearly registered to the MNI152 space with the ANTs registration tool (Avants et al., 2009). The resulting transformations were applied to the free water volumes in WM safe − WMHs. Mean and standard deviation free water volume for each group was computed and used to obtain a z-score volume of each subject compared to each group. These z-score volumes were averaged and thresholded at z ≥ 2 standard deviations to obtain binary group comparison volumes. Only clusters of 10 or more voxels were kept.
In both the NC vs. AD and NC vs. MCI comparisons, voxel clusters showing differences are mostly found in the corticospinal tract (CST) and bundles of the limbic system such as the cingulum and the fornix. Many clusters are also found outside these key AD bundles, generally covering all WM. Figure 3 shows that intensity and location of significant z-score clusters is different when comparing AD or MCI to NC.

DISCUSSION
A preliminary version of these results was presented at ISMRM 2018 (Dumont et al., 2018) and since then, more studies demonstrating the association of FW in WM and cognitive decline (Maillard et al., 2018;Reas et al., 2018) support the idea that a single whole-brain FW measurement is viable for clinical settings and potential use as a priori outcome measure in clinical trials for diagnostic grouping along the AD continuum.
To achieve that reliable and simple measurement, we identified and overcame three major obstacles (partial volume contamination, WMHs and brain atrophy) to measuring FW content in aging subjects' WM and verified group differentiation with and without each solution. First, as a baseline, FW in whole WM (without correction) was significantly higher in ADs and MCIs than in NCs. Removing partial volume contamination with WM safe sharpened group differentiation. Removing WMH and WMH penumbrae slightly decreased differences in groups means while keeping significant differentiation. This can be explained by another result presented demonstrating that WMH volume alone differentiates AD from NC subjects, reinforcing the hypothesis that WMHs and their penumbrae need to be removed to get a relevant measurement of FW in normal appearing WM. We also demonstrated that the group-wise differences of FW content within the WMH lesions was smaller than the groupwise differences of FW content in WM safe , suggesting that, unlike normal appearing white matter, WMH lesions may have similar underlying pathophysiology across the disease spectrum. Finally, correcting for brain atrophy in aging patients using relative free water volume further sharpened group differentiation. We then visualized the spatial distribution of high FW differences (in WM safe − WMHs) between groups using high z-scores clusters. These results further strengthened the whole WM measurement idea by showing that while some of the differences are located in bundles known to be associated with AD, the entirety of high z-score clusters globally covers all white matter.
Our new and simple FW measures can be used to increase our understanding of the role of inflammation-associated edema in AD and may aid in the differentiation of healthy subjects from MCI and AD patients. Due to the simplicity of the method and the fast image acquisition time required for the images, these measurements may be particularly useful for clinical settings and can potentially be use as a priori outcome measures in clinical trials.
FW metrics could not differentiate between MCI and AD subjects. This could be the result of the whole white matter measurement not being sensitive enough to differentiate subtle FW differences between MCI and AD. Analyzing FW content along specific WM bundles would be expected to yield more specific results but would also increase complexity by introducing tractography to reconstruct the global WM architecture followed by an automated segmentation of several key WM bundles such as the fornix, cingulum, corpus callosum, and association tracts (arcuate fasciculus, uncinate, inferior longitudinal, and inferior fronto-occipital fasciculus). FW metrics would be analyzed along those bundles, as done in apparent fiber quantification (AFQ) (Yeatman et al., 2012) and tract-profiling (Cousineau et al., 2017). Future work could also include looking at how FW correlates with amyloid beta and tau data available in ADNI to further support the hypothesis that FW is a viable proxy measurement of neuroinflammation.  The FW threshold used to compute rFW was defined empirically by observing this particular set of data. Adjustments might be needed to do this analysis on a different database. After the main processing, further tests were done with different thresholds. Group separation remained fairly stable in the 0.1 neighborhood but drops drastically when increasing the threshold past 0.2 due to very low occurrences of these FW values after removing WMHs and partial volume contamination. On the other hand, when lowering the threshold, group separation decreases slowly and stabilizes. This suggests that removing background FW values moderately sharpens group differentiation. An optimal threshold could automatically be found with small increments but it would be specifically tuned for these groups instead of representing the underlying biological phenomenon.
It is important to note that FW metrics used in the current study also have limitations, i.e. they are derived from a bi-tensor model, which is limited to representing a FW compartment and a single fiber population. It is estimated that 66 to 90 percent of brain WM voxels contain at least two fiber populations (Ji et al., 2017;Montal et al., 2018). In those voxels, the estimated contribution of the FW compartment is incorrectly estimated, since some of the signal arising from the fiber populations not fitted to the single fiber tensor may be assigned to the FW compartment. To correct this bias, a FW model accounting for more than one fiber population would need to be used to better fit the signal. While a more sophisticated model would certainly better characterize the information contained in the non-freewater portion of the signal and give more accurate free water indices, these models require multi-shell DWI acquisitions which are unavailable in ADN2 and ADNIGO.
In future works, visualization using z-score clustering could be replaced with a more robust method that takes into account multiple comparisons and cluster-based thresholding such as threshold-free cluster enhancement (Smith and Nichols, 2009) and non-parametric permutation tests (Nichols and Holmes, 2002).
Longitudinal data is available in ADNI2 and ADNIGO but was not analyzed in this study. Future work should make use of this longitudinal data and look into the potential prognostic value of FW values at baseline.

CONCLUSION
This study demonstrates that after removing partial volume contamination, removing WMHs and their penumbra and accounting for brain atrophy in elderly, the free water content of healthy looking white matter differentiates MCI and AD groups from healthy subjects. Our method is based on existing DTIlike diffusion data, is atlas free, requires no registration with a reference brain, no PET scan, no tractography, has few tunable parameters, and takes a few minutes only of computation. The method is a simple but powerful approach that may be used clinically or in the context of patient selection and stratification for novel treatments that are aimed at treating or preventing inflammation components of AD using legacy or standard diffusion MRI data. The significant differences of our FW metrics between NC and MCI as well as NC and AD may demonstrate the potential of FW as a tool to study neuroinflammation. We intend to extend this work with analyses of FW metrics in specific white matter bundles and sections of bundles. Also, characterization over time of our new FW metrics in an MCI population could help differentiate those older adults who will remain relatively stable and those who will progress to AD, which has utility for patient selection and stratification of subjects in preclinical stages of AD.

DATA AVAILABILITY STATEMENT
The image datasets used in this study are publicly available from the ADNI database. Listings of individual subject IDs used in this analysis are available upon request to the corresponding author.

ETHICS STATEMENT
The human data was acquired from the publicly available ADNI database which meet the ethics requirements.

AUTHOR CONTRIBUTIONS
MDu: create processing pipeline, process data, design study, write/review text. MR: design study, write/review text, provide biological expertise. P-MJ: write/review text, provide data processing expertise. FM: create processing pipeline, process data, write/review text. J-CH: design study, write/review text, provide diffusion expertise. ZX and MDe: design study, write/review text, provide diffusion imaging expertise. CB: process data, design study, write/review text, provide stats expertise. TS: design study, write/review text, provide neuroinflammation expertise. KV and JG: design study, write/review text, provide MR imaging expertise.