White matter hyperintensities segmentation: a new semi-automated method

White matter hyperintensities (WMH) are brain areas of increased signal on T2-weighted or fluid-attenuated inverse recovery magnetic resonance imaging (MRI) scans. In this study we present a new semi-automated method to measure WMH load that is based on the segmentation of the intensity histogram of fluid-attenuated inversion recovery images. Thirty patients with mild cognitive impairment with variable WMH load were enrolled. The semi-automated WMH segmentation included removal of non-brain tissue, spatial normalization, removal of cerebellum and brain stem, spatial filtering, thresholding to segment probable WMH, manual editing for correction of false positives and negatives, generation of WMH map, and volumetric estimation of the WMH load. Accuracy was quantitatively evaluated by comparing semi-automated and manual WMH segmentations performed by two independent raters. Differences between the two procedures were assessed using Student’s t-tests and similarity was evaluated using linear regression model and Dice similarity coefficient (DSC). The volumes of the manual and semi-automated segmentations did not statistically differ (t-value = -1.79, DF = 29, p = 0.839 for rater 1; t-value = 1.113, DF = 29, p = 0.2749 for rater 2), were highly correlated [R2 = 0.921, F(1,29) = 155.54, p < 0.0001 for rater 1; R2 = 0.935, F(1,29) = 402.709, p < 0.0001 for rater 2] and showed a very strong spatial similarity (mean DSC = 0.78, for rater 1 and 0.77 for rater 2). In conclusion, our semi-automated method to measure the load of WMH is highly reliable and could represent a good tool that could be easily implemented in routinely neuroimaging analyses to map clinical consequences of WMH.


INTRODUCTION
White matter hyperintensities (WMH) are brain areas of increased signal intensity, appearing on T2-weighted (T2-w) or fluidattenuated inversion-recovery (FLAIR) magnetic resonance imaging (MRI) scans, that result from localized changes in tissue composition (Barkhof and Scheltens, 2002). The etiology of WMH is not specific and may be secondary to ischemia, demyelinating disorders, hydrocephalus, trauma, inflammatory diseases, radiation injury, amyloidosis, and other causes (Pantoni and Garcia, 1997).
WMH are often observed in elderly participants (de Groot et al., 2000a), and their frequency is greater in individuals with cerebrovascular risk factors such as diabetes and hypertension (de Leeuw et al., 2001). Furthermore, WMH are a common finding in those subjects with neurological illnesses, such as stroke (Fazekas et al., 1993), Parkinson's disease (Marshall et al., 2006), mild cognitive impairment (MCI;DeCarli et al., 2001), Alzheimer's disease (Fazekas et al., 1996;Hirono et al., 2000), and even in patients with primary mental disorders including mood disorders and schizophrenia spectrum disorders (Brown et al., 1995).
Although neuropathological, clinical, and cognitive significance of WMH is unclear (Gunning-Dixon and Raz, 2000) large epidemiological studies provide evidence that WMH have a strong impact on cognitive functioning (Gunning-Dixon and Raz, 2000) and they have been associated with impairment in a number of domains, including psychomotor speed, frontal executive functions (de Groot et al., 2000b;Prins et al., 2004), and explicit memory (de Groot et al., 2000a).
Interestingly, a systematic review and meta-analysis (Debette and Markus, 2010) provides strong evidence that WMH may be an important predictor of future disease, being associated with an increased risk of stroke, dementia, and mortality. Moreover, it has been shown that WMH often occur in preclinical stages of dementia (such as in MCI patients) and that the presence of WMH may also increase the likelihood of progression from MCI to dementia (Wu et al., 2002).
A challenging issue in studying WMH is their quantification and localization, given their variability and scattered spatial distribution. Two analytic strategies have been used to evaluate WMH on MRI brain images: (1) semi-quantitative rating systems and (2) Frontiers in Aging Neuroscience www.frontiersin.org quantitative volumetric analyses. The semi-quantitative approach mainly consists in the computation of visual ratings and several scales, markedly different in morphological or anatomical definitions, have been proposed and used in literature (Fazekas et al., 1987;Scheltens et al., 1993;Mäntylä et al., 1997;Wahlund et al., 2001). WMH visual rating scales are relatively quick and easy to perform even across scans and scanners, and they are commonly used in clinical and research settings, thus making them attractive for large epidemiological studies. Unfortunately, they have a number of limitations. Indeed, categorical ratings have a restricted range of values that limit the power of association. Moreover, qualitative scales are often subjective in their interpretation, thus limiting inter-rater and intra-rater reliability (Mäntylä et al., 1997) and consistency within longitudinal studies (Prins et al., 2004). On the other hand, a number of more recent researches have introduced quantitative methods of measuring WMH severity using computer-based techniques to obtain volumetric measures of WMH burden. These methods vary from manual outlining techniques to fully-automatic WMH detection (Jack et al., 2001;Anbeek et al., 2004;Wen and Sachdev, 2004;Admiraal-Behloul et al., 2005;Wu et al., 2006;Gibson et al., 2010;Cerasa et al., 2012;Simões et al., 2013).
Manual outlining techniques are referred to as region-ofinterest (ROI) methodologies, in which the tracer inspects the scan using visualization softwares and manually draws WMH areas. The volume of each region is then calculated from the section thickness and the number of voxels included in the traced area. The values of all sections are then added together to make a total WMH volume. Manual outlining procedures, although accurate, have several limitations such as labor intensiveness, time consume, subjectiveness, and error-proneness. In addition, manual detection suffers from intra-observer and inter-observer variability (Smith et al., 2002).
In recent years, major progresses have been made on the development of semi-or fully-automated segmentations of WMH. These techniques are based on computer algorithms developed to replace human eye, allowing the collection of quantitative volumetric data on WMH (Jack et al., 2001;Anbeek et al., 2004;Wen and Sachdev, 2004;Admiraal-Behloul et al., 2005;Wu et al., 2006;Gibson et al., 2010;Cerasa et al., 2012;Simões et al., 2013). These techniques are more objective and free from user bias compared to the visual WMH ratings (Payne et al., 2002).
However, such methods have different accuracy, computational speed, and complexity. The majority of these approaches relies on multimodal data, thus being based on different MRI sequences, including T2-w, proton density (PD), T1-weighted (T1-w), inversion recovery (IR) and, more widely, FLAIR images. Innovative procedures to detect WMH have been developed based on Markov random field model (Schwarz et al., 2009), k-nearest neighbor (Anbeek et al., 2004;Wen and Sachdev, 2004), and neural classification (Dyrby et al., 2008), which require training images with WMH labels. The segmentation accuracy of these methods depends on the representative training data that may be difficult to select due to the heterogeneous nature of WMH. Differently, Admiraal-Behloul et al. (2005) proposed a fuzzy inference system to classify the WMH based on both anatomical locations and intensity values from three different MR images (T2-w, PD, and FLAIR) without the need of training samples. Moreover, some methods automatically or semi-automatically segment the WMH relying exclusively on FLAIR images by defining a cut-off threshold on the images (Jack et al., 2001;Wu et al., 2006;Gibson et al., 2010;Simões et al., 2013). In this context, Jack et al. (2001) and Gibson et al. (2010) employed empirical thresholds before applying linear fitting or fuzzy clustering to segment WMH. Similarly, Wen and Sachdev (2004) used the mean and standard deviation (SD) of gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) intensities to estimate the threshold for WMH and they used a WM probability map to identify the most likely WM regions. Recently, Simões et al. (2013) proposed a WMH segmentation method based on a modified context-sensitive Gaussian mixture model to determine voxel class probabilities, followed by a false positive correction step, where common FLAIR artifacts are eliminated from the segmentation.
All the quantitative methods here discussed have some limitations. Indeed, complex automatic or semi-automatic computerbased segmentation procedures require a large amount of technical resources that may not be available in clinical settings. Furthermore, multi-spectral approaches are not always accessible in clinical practice since the acquisition of all these images is costintensive and requires long processing time (Shen et al., 2008). Conversely, FLAIR-based approaches may, in sporadic cases, overestimate the WMH load due to FLAIR typical high intensity appearance in cortical areas, such as the septum pellucidum, and low artifacts in the fourth ventricle where a large percentage of false positive is detected (Wang et al., 2012). Such limitations make these methods scarcely applicable in large clinical contexts and barely replicable across sites.
In this study, we present a semi-automated method for WMH load measurement that overcomes the above presented limits. We applied the method on a group of MCI patients with various WMH load. The aims of our study are: (i) to develop a semi-automated algorithm for the detection, quantification, and localization of WMH using only FLAIR images and T1-w at 3T which is fully reproducible in different research contexts and clinical populations; and (ii) to evaluate the accuracy of the method by comparing it with manual segmentation.

SUBJECTS
We recruited in our memory clinic in Rome, Italy, 30 patients with amnestic mild cognitive impairment (a-MCI; 21 males, 9 females; mean age ± SD = 72 ± 6 years, range 56-85; mean education = 9 ± 4 years; mean mini mental state examination (MMSE; Folstein et al., 1975; score = 26 ± 2) with a variable degree of WMH load. Specific inclusion criteria for a-MCI were (1) diagnostic evidence of a-MCI consistent with Petersen guidelines (Petersen et al., 1999), i.e., (i) complaint of defective memory, (ii) normal activities of daily living, (iii) abnormal memory function for age, and (iv) absence of dementia; and (2) suitability for MRI scanning.
Exclusion criteria were (1) major medical illnesses and autoimmune-inflammatory disease; (2) co-morbidity of primary Frontiers in Aging Neuroscience www.frontiersin.org psychiatric or neurological disorders, and (3) any other significant mental or neurological disorder. Medical and psychiatric histories were obtained from each subject, and all patients underwent a series of standard clinical examinations, including physical, neurological and mental status examinations, neurocognitive tests, and brain MRI (described below). All patients underwent the first diagnostic assessment and none were taking acetylcholinesterase inhibitors or psychotropic drugs (antidepressants, benzodiazepines, and antipsychotics). The study was approved and undertaken in accordance with the guidance of our local Ethics Committee and written consent was obtained from all participants.

MRI ACQUISITION
All 30 participants underwent the same imaging protocol, which included 2D FLAIR and whole-brain high-resolution T1-w images, using a 3T Allegra MR imager (Siemens, Erlangen, Germany) with a standard quadrature head coil. All planar sequences acquisitions were obtained in the plane of the AC-PC line. Particular care was taken to center the subjects in the head coil and to restrain their movements with cushions and adhesive medical tape. Two-dimensional FLAIR images were obtained in the axial plain (TE = 109 ms, TR = 8500 ms, slice thickness = 5 mm, slices = 24, matrix = 188 × 256, phase FOV = 0.73). Whole-brain T1-w images were obtained in the sagittal plane using a modified driven equilibrium Fourier transform (MDEFT, TE = 2.4 ms, TR = 7.92 ms, flip angle = 15 • , voxelsize = 1 mm × 1 mm × 1 mm). All scans were visually inspected by a neuroradiologist (GL) with high expertise in clinical neuroimaging in order to exclude images with poor quality and motion artifacts.
A representative axial slice of FLAIR images for each patient is shown in Figure 1.

SEMI-AUTOMATED WMH SEGMENTATION
The semi-automated WMH segmentation algorithm was performed using FSL 4.1 1 and the VBM8 toolbox 2 implemented in SPM 8 3 running in Matlab 2007b (MathWorks, Natick, MA, USA) and it consisted of three major steps: (1) Image pre-processing, including non-brain tissue removal, spatial normalization into MNI space, removal of cerebellum and brain stem, and spatial filtering; (2) Automated detection of WMH in FLAIR images and, if necessary, subsequent manual editing for removal of false positives; (3) Post-processing, including the generation of WMH map and volumetric estimation of the WMH load at the individual and sample levels.

STEP 1: PRE-PROCESSING
Image preprocessing included skull-stripping of the FLAIR images to restrict our analyses to brain tissue only reducing the risk of false positives and to improve the accuracy and efficiency of WMH segmentation.

Removal of non-brain tissues
For each subject, the FLAIR image was co-registered to the T1weighted image using the FLIRT tool, integrated within FSL software through full affine alignment with nearest-neighbor resampling (correlation ratio cost function). Then, VBM8 with standard options was used to segment T1-weighted images and obtain a whole binary brain mask by summing the segmented GM, WM, and CSF in native space. The mask was then multiplied by the co-registered FLAIR and T1-w images to remove non-brain tissue. The average time for this step was 15 min per subject.

Spatial normalization of the skull-stripped FLAIR images into MNI space
The T1-w skull-stripped images were then linearly registered to MNI152 T1 1 mm brain template, and the transformation matrix was applied to the skull-stripped FLAIR images. Average time to complete the task for each subject was 1 min.

Removal of cerebellum and brain stem
Since WMH are rare in cerebellum and brain stem (Wen and Sachdev, 2004), we created a mask of these brain regions in MNI space to exclude them from the analysis. The mask was created using the Harvard-Oxford subcortical and the MNI structural atlases implemented in FSL 4.1. For each subject, the mask was then multiplied for the skull-stripped and normalized FLAIR image. The time to complete the task for each subject was negligible. In order to remove residual inhomogeneities, the normalized, skull-stripped FLAIR images were smoothed with a 2-mm full width at half maximum Gaussian kernel (Ashburner and Friston, 2000). This procedure was performed using FSL fslmaths utility. The average time to complete the step for each subject was 1 min.
Step 1 was performed using an in-house shell script which automatically concatenates all three substeps for all subjects.

Automatic identification of WMH on the intensity histogram of FLAIR images
For each subject we used FSL fslstats utility to calculate the mean and SD of the intensity of all brain voxels. In our procedure, after several trials, we optimized the process by choosing as threshold value the intensity mean +1.5 SD. This threshold was computed and applied to the FLAIR image for each subject, thus generating a map (of likely WMH) excluding all voxels having an intensity below such threshold. The time to complete the task was 1 min per subject and was implemented in a "for" cycle within a shell script.

Correction of false positives and negatives
Each generated WMH map was visually inspected and compared to the original FLAIR image in order to identify false classifications (positive or negative). These were then manually corrected (as shown in Figure 2 (FP) with a strong expertise in lesion analyses using the image processing software MRIcron 4 . In particular, wrong identifications of WMH were deleted through the eraser tool of MRIcron, while areas of WM lesions that were omitted by the algorithm were re-included. Correction of false positives was necessary for two patients while false negatives were present in nine patients (including the two were false positives were detected) and mainly consisted of underestimation of periventricular WMH volume. Manual correction required an average time of 8 min per patient.

STEP 3: POST-PROCESSING
The final output provided by the system is a binary image in which a voxel is valued 1 if it is considered a WMH, 0 otherwise. Using these binary masks, for each subject, WMH volumes (expressed in

MANUAL SEGMENTATION OF WMH
The manual segmentation of WMH on FLAIR images was performed by an expert neuroradiologist (Giacomo Luccichenti) and a trained clinician (Claudia Cacciari), expert in lesion segmentation, who were not aware of the results of the semi-automated procedure. Manual segmentation was delineated on the standard registered FLAIR images using MRIcron software by tracing the lesion outline with a mouse-controlled interface. This process resulted in the definition of binary images, considered as gold standard. For each subject, WMH volumes (expressed in cm 3 ) were calculated automatically using FSL fslstats utility. The mean time to complete the task for each subject was 2 h and 32 min.

STATISTICAL ANALYSES
Statistical analyses were performed with Statview software. The inter-rater reliability was calculated using the Spearman correlation coefficient. The differences between volumetric data derived from semi-automated and manual segmentations were evaluated using Student's t-test while a linear regression model was computed to assess the relationship between the two datasets. Furthermore, we assessed the similarities between the two datasets in terms of both volumetric and spatial agreement by computing the Dice similarity coefficient (DSC; Dice, 1945), which is a measure of sets agreement, given by the formula: where Seg is the automatically segmented image and Ref is the manually segmented image. DSC reflects differences in locations more strongly than differences in size (Zijdenbos et al., 1994). Indeed, this formula represents the size of the union of the Seg and Ref sets divided by the average size of the two sets. The DSC is commonly used when comparing datasets and, in particular, it has been widely used to assess similarities between automatic and manual segmentation procedures (Anbeek et al., 2004;Gibson et al., 2010;Cerasa et al., 2012;Simões et al., 2013). Generally, a DSC value of 0 indicates no spatial overlap and a value of 1 indicates perfect spatial alignment. Usually, DSC values of 0.7 and higher suggest good agreement between two delineations (Bartko, 1991).

RESULTS
The semi-automated procedure of WMH quantification was successfully completed in all cases. Hyperintense signals on FLAIR images were present in all subjects, but their extent and distribution varied considerably (mean volume = 18.63 ± 14.81 cm 3 ). The mean (±SD) volume of WMH from the two manual raters was 16.74 (±13.89) cm 3 and 19.5 (±16.29) cm 3 per subject (see Table 1).

Frontiers in Aging Neuroscience
www.frontiersin.org The second, the third and the fourth columns respectively show the white matter hyperintensities (WMH) volumes from the two manual labels and the semi-automated segmentation. SD, standard deviation.

DISCUSSION
In the present study, we showed our semi-automated procedure for the detection, localization, and quantification of WMH on FLAIR images applicable to a wide range of patients with various diseases. This procedure is based on FLAIR and T1-w images (the latter are used for preprocessing purposes only, see Figure 2). Results indicate that the algorithm performed remarkably well, compared to the gold-standard (manual segmentation by experts), since no statistical differences between the two outputs were found and a very high similarity emerged, both in terms of volumetric Frontiers in Aging Neuroscience www.frontiersin.org FIGURE 3 | Relationship between semi-automated and manual segmentation (rater 1 and rater 2). Linear fits (dotted black line) are also reported.
load and spatial location. This is an outstanding outcome, since the semi-automated procedure requires a time consumption which is approximately six times lower than the manual approach.
Other automated procedures developed to classify and quantify WMH have used a variety of classification approaches, including Markov random field model (Schwarz et al., 2009), knearest neighbor (Anbeek et al., 2004;Wen and Sachdev, 2004), neural classification (Dyrby et al., 2008), modified Gaussian mixture model (GMM) that incorporates neighborhood information (Simões et al., 2013) and threshold cut-offs (Jack et al., 2001;Gibson et al., 2010). Otherwise, our approach combines conservative voxel intensity thresholding with several key components that need further discussion.
First, we incorporated specific steps without any human intervention including the removal of non-brain tissue and of areas where WMH are unlikely (i.e., the cerebellum and the brainstem) in order to improve the accuracy of the method and overcome the limitations of subjectivity of the raters, thus ensuring testretest reliability. Furthermore, these steps, which are not included in the majority of other methodologies, reduce the probability to include false positives and decrease the time spent to remove them, thus resulting in faster processing and more accurate maps of WMH.
Second, the transformation of FLAIR images from native to standard space (MNI), allows later statistical group comparisons and/or voxel-based statistics since the WMH images are in a common coordinates system. This feature enables a quantitative estimation and localization of WMH that can be further processed using statistical voxel-based image analysis softwares, which may be of importance for the study of the consequences of WM changes on cognition and behavior (Soderlund et al., 2006;Cacciari et al., 2010;Spalletta et al., 2013). Indeed, many studies suggested that the clinical significance of WMH is related not only to their volume but also to their localization (Chen et al., 2006). Indeed, since the distribution of WMH in specific functional regions may be closely related to specific types of cognitive impairment, clinical symptoms may be linked to the regional distribution of WMH. Moreover, an important implication of this automated procedure is its implementation in follow-up studies with repeated MRI scans. Allowing coregistration among different MRI sessions, our method will facilitate a more precise detection of eventual new WMH occurring at follow up, as well as of differences in their size and localization. This, particularly, would help in generating models of disease progression.
Moreover, our automated approach offers the possibility of a fully streamlined methodology with little user intervention and utilizes freely available software (e.g., FSL and SPM), which can be easily implemented on most desktop computers of standard power and are of routine use in most laboratories equipped to carry out neuroimaging research. This makes our approach reasonably accessible. Relatively little image manipulation is required and the quantification approach is procedurally straightforward. Additionally, little knowledge in computer programming allows to automate the steps and greatly reduce the processing time using in-house shell scripts which automatically concatenate all steps for all subjects.
A possible limitation of the presented methodology lies in the low number of brain slices used as inputs. In fact, the analyzed FLAIR images were acquired with a slice thickness of 5 mm which may be a poor resolution for accurate volumetric quantification and might induce interpolation errors in the registration processes. The MRI scanner used in the present study is relatively old and does not allow fast high-resolution 3D FLAIR imaging. A higher image resolution could improve the WMH quantification, and the registration accuracy, which would lead to more precise WMH localization; however, it would lead to an increase of acquisition time thus decreasing clinical practicability. However, modern MRI systems allow fast 3D FLAIR acquisition which would surely improve image resolution and, in turn, WMH segmentation.
In summary, we presented an automated detection and segmentation approach for WMH using T1-w and FLAIR images. By evaluating our method, we demonstrated that this segmentation approach is robust and reliable and we suggest that it has potential use in clinical studies.
Future works should consider the presented method to map cognitive consequences of WMH and their progression in time through large (and possibly multicenter) cross-sectional and longitudinal studies.

Frontiers in Aging Neuroscience
www.frontiersin.org

AUTHOR CONTRIBUTIONS
Mariangela Iorio designed the study, analyzed imaging data, contributed to the writing of the manuscript; Gianfranco Spalletta designed the study, analyzed data, and contributed to the writing of the manuscript; Chiara Chiapponi analyzed imaging data and contributed to the writing of the manuscript; Giacomo Luccichenti analyzed imaging data and contributed to the writing of the manuscript; Claudia Cacciari analyzed imaging data; Maria Donata Orfei wrote the draft of the manuscript; Carlo Caltagirone contributed to the writing of the manuscript; Fabrizio Piras designed the study, analyzed imaging data, contributed to the writing of the manuscript.