Improving Accuracy of Brainstem MRI Volumetry: Effects of Age and Sex, and Normalization Strategies

Background: Brainstem-mediated functions are impaired in neurodegenerative diseases and aging. Atrophy can be visualized by MRI. This study investigates extrinsic sources of brainstem volume variability, intrinsic sources of anatomical variability, and the influence of age and sex on the brainstem volumes in healthy subjects. We aimed to develop efficient normalization strategies to reduce the effects of intrinsic anatomic variability on brainstem volumetry. Methods: Brainstem segmentation was performed from MPRAGE data using our deep-learning-based brainstem segmentation algorithm MD-GRU. The extrinsic variability of brainstem volume assessments across scanners and protocols was investigated in two groups comprising 11 (median age 33.3 years, 7 women) and 22 healthy subjects (median age 27.6 years, 50% women) scanned twice and compared using Dice scores. Intrinsic anatomical inter-individual variability and age and sex effects on brainstem volumes were assessed in segmentations of 110 healthy subjects (median age 30.9 years, range 18–72 years, 53.6% women) acquired on 1.5T (45%) and 3T (55%) scanners. The association between brainstem volumes and predefined anatomical covariates was studied using Pearson correlations. Anatomical variables with associations of |r| > 0.30 as well as the variables age and sex were used to construct normalization models using backward selection. The effect of the resulting normalization models was assessed by % relative standard deviation reduction and by comparing the inter-individual variability of the normalized brainstem volumes to the non-normalized values using paired t- tests with Bonferroni correction. Results: The extrinsic variability of brainstem volumetry across different field strengths and imaging protocols was low (Dice scores > 0.94). Mean inter-individual variability/SD of total brainstem volumes was 9.8%/7.36. A normalization based on either total intracranial volume (TICV), TICV and age, or v-scale significantly reduced the inter-individual variability of total brainstem volumes compared to non-normalized volumes and similarly reduced the relative standard deviation by about 35%. Conclusion: The extrinsic variability of the novel brainstem segmentation method MD-GRU across different scanners and imaging protocols is very low. Anatomic inter-individual variability of brainstem volumes is substantial. This study presents efficient normalization models for variability reduction in brainstem volumetry in healthy subjects.

Background: Brainstem-mediated functions are impaired in neurodegenerative diseases and aging. Atrophy can be visualized by MRI. This study investigates extrinsic sources of brainstem volume variability, intrinsic sources of anatomical variability, and the influence of age and sex on the brainstem volumes in healthy subjects. We aimed to develop efficient normalization strategies to reduce the effects of intrinsic anatomic variability on brainstem volumetry.
Methods: Brainstem segmentation was performed from MPRAGE data using our deep-learning-based brainstem segmentation algorithm MD-GRU. The extrinsic variability of brainstem volume assessments across scanners and protocols was investigated in two groups comprising 11 (median age 33.3 years, 7 women) and 22 healthy subjects (median age 27.6 years, 50% women) scanned twice and compared using Dice scores. Intrinsic anatomical inter-individual variability and age and sex effects on brainstem volumes were assessed in segmentations of 110 healthy subjects (median age 30.9 years, range 18-72 years, 53.6% women) acquired on 1.5T (45%) and 3T (55%) scanners. The association between brainstem volumes and predefined anatomical covariates was studied using Pearson correlations. Anatomical variables with associations of |r| > 0.30 as well as the variables age and sex were used to construct normalization models using backward selection. The effect of the resulting normalization models was assessed by % relative standard deviation reduction and by comparing the inter-individual variability of the normalized brainstem volumes to the non-normalized values using paired t-tests with Bonferroni correction.
Results: The extrinsic variability of brainstem volumetry across different field strengths and imaging protocols was low (Dice scores > 0.94). Mean inter-individual variability/SD of total brainstem volumes was 9.8%/7.36. A normalization based on either total intracranial volume (TICV), TICV and age, or v-scale significantly reduced the inter-individual variability of total brainstem volumes compared to non-normalized volumes and similarly reduced the relative standard deviation by about 35%.

INTRODUCTION
The brainstem as the anatomical and functional link between the cerebrum, the cerebellum, and the spinal cord is a vitally important structure, playing a key role in controlling respiratory and cardiac function, defense reflexes, and awareness. From cranial to caudal, the brainstem is divided into the three substructures mesencephalon, pons, and medulla oblongata. It carries white matter tracts to and from the cerebrum, the spinal cord, and the cerebellum, multiple cranial nerves and reticular nuclei (Nieuwenhuys, 1985;Naidich et al., 2009). While the mesencephalon plays an important role mainly in oculomotor, optic, and acoustic function, the pons contains important white matter tracts as well as cranial nerve nuclei for facial sensory and motor functions (Basinger and Hogg, 2020). The medulla oblongata regulates respiratory function and contains important reflex centers e.g., for coughing and swallowing (Bolser et al., 2015;Ikeda et al., 2017).
Studying the brainstem is crucial for our understanding of both physiologic neurological function and neurological diseases.
Brainstem tissue loss -acquired either during aging or neurodegenerative diseases -can be visualized and quantified by MRI in vivo, offering a potential as diagnostic, prognostic, or therapeutic marker in these diseases.
A recently published deep-learning-based algorithm provided accurate, highly reproducible, and robust brainstem segmentation in healthy subjects (HS) and patients with Alzheimer's disease and multiple sclerosis (Andermatt et al., 2016(Andermatt et al., , 2018Sander et al., 2019).
Despite its successful application in patients, volumetry of the brainstem and its substructures has not yet been systematically assessed in HS. Brainstem volume assessments are subject to inter-individual variation, e.g., due to head size, head position in the scanner, sex or body height. Volume normalization reduces the physiologic inter-individual measurement variation due to individual anatomical effects, ideally without interfering with measurements related to possible disease processes. This allows for better statistical comparison between two inhomogeneous groups, such as healthy controls and patients. A frequently used normalization parameter for brain volumes is the FreeSurferderived total intracranial volume (TICV; Whitwell et al., 2001) and SIENAX-derived volumetric scaling factor (v-scale; Fein et al., 2004). So far, normalization covariates for brainstem segmentation have not yet been investigated and relevant normalization factors for brainstem volumetry are not known.
Using a novel fully-automated deep-learning-based segmentation approach, the objectives of this study were to assess: a) the extrinsic variability of brainstem volumes depending on different scanners, field strengths, and acquisition protocols, b) the intrinsic anatomical variability and the influence of age and sex on the brainstem and its substructure volumes in HS, and c) the effects of normalization models on variability in brainstem volumetry.

Brainstem Segmentation
Brainstem volumes were assessed using a recently published fully-automated segmentation approach based on multidimensional gated recurrent units (MD-GRU). The deep-learning-based algorithm provides accurate, robust, and reproducible segmentations of the brainstem and its substructures (Andermatt et al., 2016(Andermatt et al., , 2018Sander et al., 2019). All segmentations were visually inspected.
Written informed consent was obtained from all participants mentioned above.
All brainstem segmentations were visually inspected for anatomic accuracy.

Statistical Analyses
Statistical analyses were performed using JMP Pro 14 and SPSS 25.

Assessing the Extrinsic Variability of Brainstem Volumes
Dice coefficients were each calculated comparing brainstem segmentations obtained in the same individual (a) on different scanners (1.5T vs. 3T), (b) on the same scanner but with different protocols, and (c) on different scanners and protocols.

Assessing the Intrinsic Variability of Brainstem Volumes
To assess the inter-individual variability, the respective deviation from the group mean was calculated for each subject as (measured volume -mean volume)/mean volume.

Age and Sex Effects
Differences in brainstem/brainstem substructure volumes between men and women were assessed using linear regression analysis with (a) age as covariate, as well as in a sensitivity analysis with (b) field strength, (c) acquisition protocol, and (d) TICV as additional covariates, respectively. Correction for multiple testing (4 analyses) was performed using the Bonferroni correction, adjusting the level of significance to p < 0.05/4. The associations between age and brainstem/brainstem substructure volumes were assessed using linear regression analysis covarying for sex. Differences of brainstem volumes in younger vs. older persons (below vs. above the group mean) were assessed using linear regression analysis with (a) field strength and (b) acquisition protocol as additional covariates, respectively.
The associations of these parameters with brainstem volume were first assessed using Pearson correlation coefficients. To correct for multiple tests, Bonferroni correction was performed with a correction factor of n = 12 (11 anatomical variables and age) (p < 0.05/12).
Only those anatomical metrics showing a significant association with all brainstem and brainstem substructure volumes with a Pearson correlation coefficient of |r| > 0.30 (Cohen, 1988), respectively, were considered as potential normalization covariates in further analyses.
We then performed a backward selection procedure starting with a model with total brainstem volume as outcome parameter, and all anatomical variables with a Pearson correlation coefficient of |r| > 0.30 in univariate analysis as well as age and sex as predictor variables.
This procedure was performed (a) with TICV and (b) with v-scale separately, as these parameters are co-linear.
The adjusted r 2 of the models resulting from the backward selection were reported, as well as of a further simplified model (considering simple application with preference for fewer and easy to measure covariates).
The normalization of brainstem volumes and its substructure volumes was then performed by using the following equation  (4), dens-opisthion (5), and brainstem angle (6). (Sanfilipo et al., 2004;Papinutto et al., 2019): with a, b, c being the estimates (regression coefficients) obtained by the linear regression analysis and X, Y, Z their measured values.
To assess the performance of the different normalization models, the inter-individual variability of the normalized brainstem volumes of each model was first compared to the variability of the non-normalized brainstem volumes using paired-t-tests, with Bonferroni correction for multiple tests (3 models, p < 0.05/3).
In a second step, we compared the performance between the normalization models by comparing the inter-individual variability of the normalized brainstem volumes by a one-way ANOVA (analysis of variance).
The performance of the different normalization models was also expressed by the % relative standard deviation (%RSD) reductions of the predicted volumes to the %RSD of the nonnormalized, measured volumes of the whole group (n = 110). The relative standard deviation (RSD) is the standard deviation divided by the mean volume.

Brainstem Segmentation
The automated brainstem segmentation approach yielded anatomically accurate results in all subjects in < 200 s/scan on an NVidia GeForce GTX 1080 GPU: All obtained brainstem segmentations were considered anatomically correct in its location and borders, when visually inspected, no manual correction was needed.

Assessing the Influence of Different Scanners and Protocols on Brainstem Volume Variability
The results of brainstem segmentation comparisons from different scanners and protocols are shown in Table 1.
MD-GRU derived brainstem segmentations from scans of the same individual obtained on different 3T scanners (Prisma vs. Skyra) using the same acquisition protocol showed Dice scores between 0.95 and 0.98.
Similarly, Dice coefficients comparing segmentations from scans of the same individual using different imaging protocols as specified above on the same 1.5T Avanto scanner were between 0.94 and 0.97.

Age and Sex Influence
Men had significantly larger unadjusted volumes of the total brainstem, mesencephalon, pons, and medulla oblongata (all p < 0.0001, respectively), compared to women with total brainstem volumes of 28274.0/2670.1 for men (mean [mm 3 ]/SD) vs. 24826.2/2824.1 (mean [mm 3 ]/SD) for women. However, after adjustment for age and TICV (to account for head size differences), men showed significantly larger medulla oblongata volumes compared to women (Appendix A in Supplementary Material) with all other comparisons being insignificant after Bonferroni correction. Adjustment for field strength or protocol did not alter these observations. With adjustment for sex, there was no significant association between age and total brainstem (p = 0.4131) volumes. In line with this observation, total brainstem volumes did not differ significantly between older subjects (aged above the group mean of 35 years; n = 44) and younger subjects (<35 years; n = 66) (p = 0.3068) with adjustment for sex. Results were comparable for mesencephalon, pons, and medulla oblongata volumes (Appendix B in Supplementary Material). This finding was independent of additional adjustment for field strength or acquisition protocol (Appendix C in Supplementary Material). Table 2 reports the strength of the correlations of total brainstem volume with each of the investigated variables. Amongst these metrics, nasion-opisthion, dens length, TICV, v-scale, WM, GM, and BV (all normally distributed) showed a significant correlation with brainstem and all substructure volumes surviving the Bonferroni correction for multiple tests with a Pearson correlation coefficient |r| > 0.30 (Table 2 and Appendix D in Supplementary Material) and are therefore potential univariate predictors. As BV, GM, and WM volumes can be altered by neurodegenerative processes, these variables were not considered in further analyses.

Assessing Potential Normalization Models for Anatomical Variability Reduction
Pearson correlation coefficients of all variables and brainstem substructure volumes are shown in Appendix D in Supplementary Material.

Comparison of Different Normalization Models
The backward selection procedure resulted in two models based on TICV and age (Model 1a) and v-scale (Model 2). The model based on TICV and age was further simplified to TICV alone (Model 1b) ( Table 3).
Results of the linear regression analysis with brainstem substructure volumes as outcome are shown in Appendix E in Supplementary Material.
The model with TICV and age consistently yielded the highest adjusted r 2 . However, eliminating the variable age from the Model 1a did not substantially reduce the variance explained. Brainstem volume normalization by TICV (Model 1b) or v-scale (Model 2) yielded comparably high r 2 .

DISCUSSION
Using a novel, accurate, fully automated, and rapid brainstem segmentation method (Sander et al., 2019) we explored sources of extrinsic (field strength, protocol) as well as intrinsic anatomical variability, investigated age and sex influences on brainstem volumes on high-resolution MPRAGE images in HS and developed potential normalization strategies for variability reduction in brainstem volumetry.
The extrinsic variability of our brainstem volumetry assessment method with respect to different acquisition protocols, hardware, and magnetic field strength was low; the comparisons of brainstem segmentations obtained in the same individuals assessed by different scanners as well as different protocols and both different scanners and protocols yielded very high Dice scores (≥0.94). These results confirm the robustness of the applied brainstem segmentation algorithm with respect to different image acquisition settings, i.e., different scanners with 1.5T and 3T field strength and different acquisition protocols.
Consistent with previous studies, our results showed no relevant age dependent volume reduction of the brainstem and its substructures in this cohort aged between 18 and 72 years. With a mean age of 34.9 years and a median age of 30.9 years, this cohort might be, however, more representative for middle-aged and younger adults. Based on the result we cannot fully exclude a decline in brainstem volume in healthy persons of advanced age.
The lack of an age dependent volume reduction observed in this cohort is consistent with previous studies: Several crosssectional brainstem segmentation studies based on manual brainstem segmentation reported no association of ventral pons volumes with age (Raz et al., 2001;Sullivan et al., 2004). Likewise, no age effects were found in total brainstem and medulla oblongata volumes (Luft et al., 1999;Lee et al., 2009). Lambert et al. (2013) found isolated midbrain atrophy in HS of age older than 60 years, predominantly due to a volume loss of the superior cerebellar fiber bundles which are not taken into account in our mesencephalon volumetry definition.
In our study, men showed significantly larger unadjusted volumes of the brainstem and its substructures compared to women, which is in line with findings by Raz et al. (2001) and Sullivan et al. (2004). Lee et al. (2009) also reported larger medulla oblongata volumes in men. However, after adjustment for TICV (to account for head size differences) and age, the differences observed between men and women remained only significant for medulla oblongata volumes.
Anatomical variations between HS are an important source of brainstem volume variability with this cohort showing an inter-individual variability of about 10% for brainstem volumes. Therefore, normalization of brainstem volumes is crucial to reduce measurement variation to facilitate the applicability of brainstem volumetrics as a surrogate marker for prognosis, disease course monitoring and therapeutic monitoring in neurodegenerative diseases as e.g., amyotrophic lateral sclerosis, Alzheimer's and Parkinson's disease.
Intracerebral metrics like GM, WM, and BV are expected to be altered by neurodegenerative pathologies, and their potential use as covariates of brainstem volumes might therefore only be adequate in studies involving HS. Hence these parameters were not considered as adequate brainstem normalization parameters.
Models based on FreeSurfer-derived TICV and SIENAXderived v-scale, two commonly used normalization parameters, as well as TICV and age scored highest adjusted r 2 in linear regression analyses with brainstem and brainstem substructures as outcomes and were therefore further tested as normalization variables.
Normalization for anatomic variation of head size by TICV and age reduced the %RSD of total brainstem volumes by 36%, of mesencephalon volumes up to 46%. Normalization with TICV or v-scale alone showed comparable results.
Brainstem volume normalization based on each of the three normalization models significantly reduced the interindividual variability compared to the non-normalized volumes. Comparison between the three normalization models showed no significant differences in inter-individual variability of brainstem and brainstem substructure volumes, indicating an equal efficiency of normalization by these models.
TICV and v-scale are frequently applied normalization parameters for brain volumes because in general not affected by neurological/neurodegenerative diseases. By normalization with TICV, inter-individual variation of brain volumes was previously reduced about 4% (Whitwell et al., 2001). Using a similar methodological approach normalization with v-scale reduced variation in spinal cord volumetry by up to 10.24% (Papinutto et al., 2019).
By reducing measurement variability, we expect the proposed normalization methods to improve the sensitivity in detecting subtle brainstem volume differences between patients with diseases affecting the brainstem and/or its substructures and healthy controls or between patients' subgroups. Thus, previous studies showed improved detection of spinal cord volume differences between multiple sclerosis patients and controls after cervical volume normalization (Oh et al., 2014). Brainstem volume normalization, by reducing anatomical variability, might allow to reveal and strengthen clinical-radiological correlations in neurodegenerative diseases such as multiple sclerosis or Alzheimer's disease (Zhou et al., 2014).
The absence of brainstem volume reductions with increasing age observed in our cross-sectional study is in line with findings in other cross-sectional studies of Raz et al. (2001), Sullivan et al. (2004), and Walhovd et al. (2011). Walhovd et al. reported age-related volume differences in all examined brain structures except the brainstem based on FreeSurfer assessments in a large cohort of HS. To disentangle the exact mechanisms underlying the relative volume preservation of the brainstem with increasing age is beyond the scope of this descriptive study. As a phylogenetically relatively old structure the brainstem is crucial for survival. The reasons for its relative resilience to atrophy compared to other phylogenetically old structures like the hippocampus (Jack et al., 1998;Schröder and Pantel, 2016), amygdala (Kurth et al., 2019) and entorhinal cortex (Hasan et al., 2016) remain unknown. Potential limitations of this study include the underrepresentation of very advanced age and the cross-sectional design that does not allow intra-individual comparisons. Longitudinal studies covering a sufficiently long time-span are difficult to perform, but are certainly necessary to confirm our cross-sectional results in this regard.
The vital function of the brainstem, its clinical involvement in neurodegenerative and neuroinflammatory diseases, and the absence of volume reductions observed in HS aged from 18 to 72 years in this study render atrophy assessments of the brainstem and its substructures an interesting imaging surrogate candidate for the study of neurodegeneration as e.g., in progressive multiple sclerosis. This study analyzed different sources of both extrinsic and intrinsic variability of brainstem volumetry assessments and evaluated normalization models for variability reduction in healthy controls. The inter-individual anatomical variability of total brainstem volumes is relatively high but can be efficiently reduced by 36% using a normalization based on both TICV and age, and by about 34% based on TICV or v-scale alone.
This study's automated segmentation approach proved to be robust across different scanners, field strengths and imaging protocols and allows very fast, efficient, anatomically accurate, and reliable automated brainstem segmentation.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: Upon reasonable request, we will render the detailed results derived from the reported analyses available.
Requests to access these datasets should be directed to regina.schlaeger@usb.ch.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethikkommission Nordwest-und Zentralschweiz. The participants provided their written informed consent to participate in these studies.

AUTHOR CONTRIBUTIONS
LS: conceptualization, methodology, analysis, and writing. AH, SP, and SA: methodology and analysis. MA: data collection. TS: analysis. MW and EK: proof reading and analysis. ÖY: data collection and proof reading. LK and CG: proof reading and methodology. JW and PC: supervision, proof reading, and acquiring funding. RS: conceptualization, analysis, writing, supervision, and acquiring funding. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the Swiss National Science Foundation (MHV program, PMPDP3 171391); and the Swiss MS Society.