Intravenous Delayed Gadolinium-Enhanced MR Imaging of the Endolymphatic Space: A Methodological Comparative Study

In-vivo non-invasive verification of endolymphatic hydrops (ELH) by means of intravenous delayed gadolinium (Gd) enhanced magnetic resonance imaging of the inner ear (iMRI) is rapidly developing into a standard clinical tool to investigate peripheral vestibulo-cochlear syndromes. In this context, methodological comparative studies providing standardization and comparability between labs seem even more important, but so far very few are available. One hundred eight participants [75 patients with Meniere's disease (MD; 55.2 ± 14.9 years) and 33 vestibular healthy controls (HC; 46.4 ± 15.6 years)] were examined. The aim was to understand (i) how variations in acquisition protocols influence endolymphatic space (ELS) MR-signals; (ii) how ELS quantification methods correlate to each other or clinical data; and finally, (iii) how ELS extent influences MR-signals. Diagnostics included neuro-otological assessment, video-oculography during caloric stimulation, head-impulse test, audiometry, and iMRI. Data analysis provided semi-quantitative (SQ) visual grading and automatic algorithmic quantitative segmentation of ELS area [2D, mm2] and volume [3D, mm3] using deep learning-based segmentation and volumetric local thresholding. Within the range of 0.1–0.2 mmol/kg Gd dosage and a 4 h ± 30 min time delay, SQ grading and 2D- or 3D-quantifications were independent of signal intensity (SI) and signal-to-noise ratio (SNR; FWE corrected, p < 0.05). The ELS quantification methods used were highly reproducible across raters or thresholds and correlated strongly (0.3–0.8). However, 3D-quantifications showed the least variability. Asymmetry indices and normalized ELH proved the most useful for predicting quantitative clinical data. ELH size influenced SI (cochlear basal turn p < 0.001), but not SNR. SI could not predict the presence of ELH. In conclusion, (1) Gd dosage of 0.1–0.2 mmol/kg after 4 h ± 30 min time delay suffices for ELS quantification. (2) A consensus is needed on a clinical SQ grading classification including a standardized level of evaluation reconstructed to anatomical fixpoints. (3) 3D-quantification methods of the ELS are best suited for correlations with clinical variables and should include both ears and ELS values reported relative or normalized to size. (4) The presence of ELH increases signal intensity in the basal cochlear turn weakly, but cannot predict the presence of ELH.

In-vivo non-invasive verification of endolymphatic hydrops (ELH) by means of intravenous delayed gadolinium (Gd) enhanced magnetic resonance imaging of the inner ear (iMRI) is rapidly developing into a standard clinical tool to investigate peripheral vestibulo-cochlear syndromes. In this context, methodological comparative studies providing standardization and comparability between labs seem even more important, but so far very few are available. One hundred eight participants [75 patients with Meniere's disease (MD; 55.2 ± 14.9 years) and 33 vestibular healthy controls (HC; 46.4 ± 15.6 years)] were examined. The aim was to understand (i) how variations in acquisition protocols influence endolymphatic space (ELS) MR-signals; (ii) how ELS quantification methods correlate to each other or clinical data; and finally, (iii) how ELS extent influences MR-signals. Diagnostics included neuro-otological assessment, video-oculography during caloric stimulation, head-impulse test, audiometry, and iMRI. Data analysis provided semi-quantitative (SQ) visual grading and automatic algorithmic quantitative segmentation of ELS area [2D, mm 2 ] and volume [3D, mm 3 ] using deep learning-based segmentation and volumetric local thresholding. Within the range of 0.1-0.2 mmol/kg Gd dosage and a 4 h ± 30 min time delay, SQ grading and 2D-or 3D-quantifications were independent of signal intensity (SI) and signal-to-noise ratio (SNR; FWE corrected, p < 0.05). The ELS quantification methods used were highly reproducible across raters or thresholds and correlated strongly (0.3-0.8). However, 3D-quantifications showed the least variability. Asymmetry indices and normalized ELH proved the most useful for predicting quantitative clinical data. ELH size influenced SI (cochlear basal turn p < 0.001), but not SNR. SI could not predict the presence of ELH. In conclusion, (1) Gd dosage of 0.1-0.2 mmol/kg after 4 h ± 30 min time delay suffices for ELS quantification. (2) A consensus is needed on a clinical SQ grading classification including a standardized level of evaluation reconstructed to anatomical fixpoints. (3) 3D-quantification methods of the ELS are best

INTRODUCTION
In-vivo non-invasive verification of endolymphatic hydrops (ELH) by means of delayed gadolinium (Gd) enhanced magnetic resonance imaging of the inner ear (iMRI) is rapidly developing into a standard clinical tool to investigate episodic vertigo (1)(2)(3). This is due to iMRI allowing pre-mortem detection of ELH for the first time (4,5), demonstrating that ELH is not pathognomonic to Menière's disease (MD) (6)(7)(8), but rather a concomitant that can be found in various etiologies of episodic vertigo (9-13). Consequently, the clinical prevalence and pathophysiological significance of ELH has yet to be conclusively clarified. Understanding the underpinnings of the ELH syndrome requires a systematic investigation of pathologies involving endolymphatic space (ELS) changes as well as its base physiological condition.
(ii) How ELH measures correlate with each other, as well as with clinical symptoms or neurophysiological testing.
(iii) How ELH influences SNR and SI within the ELS.

Setting and Institutional Review Board Approval
All data was acquired at the Interdisciplinary German Center for Vertigo and Balance Disorders (DSGZ) and the Department of Neurology of Munich University Hospital (LMU) between 2016 and 2019. Institutional Review Board approval was obtained before the initiation of the study (no. 641-15). All participants provided informed oral and written consent in accordance with the Declaration of Helsinki before inclusion in the study.

Study Population
One hundred eight consecutive participants [75 patients with Meniere's disease (MD) and 33 vestibular healthy controls (HC)] underwent delayed intravenous gadolinium-enhanced magnetic resonance imaging (iMRI) for exclusion or verification of ELH. The diagnosis of Meniere's disease (MD) was based on the Classification Committee of the Bárány Society 2015 (33). HC were inpatients of the Department of Neurology without symptoms or underlying pathologies of the peripheral and central vestibular and auditory system that underwent MRI with a contrast agent as part of their diagnostic workup and agreed to undergo iMRI sequences after 4 h. HC underwent audiovestibular testing to confirm the soundness of their peripheral end organs. The reasons for their admission to the clinic included movement disorders (n = 6), epilepsy (n = 5), optic neuritis (n = 4), trigeminal neuralgia (n = 4), headache (n = 4), idiopathic facial nerve palsy (n = 3), viral meningitis (n = 3), subdural hematoma (n = 2), spinal inflammatory lesion (n = 1), and decompensated esophoria (n = 1). The laterality quotient for right-handedness was assessed with the 10-item inventory of the Edinburgh test (34,35). The inclusion criterion was age between 18 and 85 years. The exclusion criteria were other neurological or psychiatric disorders, as well as any MRrelated contraindications (36), poor image quality, or missing MR sequences.  Figure 1A). "X-mas tree" with "X-mas lights", where ELS is slightly enlarged and indirectly visible as a nodular black cut out (cp. Figure 1B). "X-mas tree" (=enhancing scala vestibuli and scala tympani) with "X-mas balls" (=nodular enlargement non-enhancing scala media)

Grade 2
Displacement of RM, cochlear duct > scala vestibuli Scala media is scalloping into the scala tympani, PLS has a semicircular appearance "X-mas tree" with "X-mas balls", where ELS is bulding into scala tympani whilst giving the PLS a semicircular appearance (cp. Figure 1C).

Grade 3
A severely distended scala media causes a flattened appearance of the perilymph space No scala vestibuli visible "X-mas tree" with "X-mas garlands", where ELS is distended and causes a flattened appearance of the PLS (cp. Figure 1D). "X-mas tree" (=enhancing scala vestibuli and scala tympani) with "X-mas garlands" (=linear enlarged non-enhancing scala media)  Figure 1D).

No PLS visible
AR, area ratio; ELS, endolymphatic space; L-SCC, lateral semicircular canal; PLS, perilymphatic space; SURI, ratio ≥ 1 between the area of the sacculus and the area of the utriculus, RM, Reissner's membrane. The bold text highlights the main or most important characteristics.

Nomenclature
In the following, "ipsilateral" refers to the clinically leading side (or affected side) and "contralateral" to the opposite side (or nonaffected side). In the case of patients presenting without a leading clinical side, a pseudorandom number generator ["Mersenne Twister" algorithm (37), uniform distribution] was used to generate a random number between 1 (=minimum value) and 9 (=maximum value). Even numbers meant "left side = ipsilateral side" and uneven numbers indicated "right = ipsilateral side." "Vegetative symptoms" refers to nausea and/or vomiting due to the episodic vertigo attack. "Ear symptoms" includes attackassociated tinnitus, hearing loss, ear pressure, and/or ear pain both uni-and bilaterally that fit the criteria for MD. "Other ear symptoms" refers to non-MD ear symptoms.

Measurement of the Auditory, Semicircular Canal, and Otolith Functions
Diagnostic workup included a thorough neurological workup (e.g., history-taking, clinical examination), neuro-orthoptic assessment [e.g., Frenzel glasses, fundus photography, and adjustments of the subjective visual vertical (SVV)], videooculography (VOG) during caloric stimulation and head impulse test (HIT), as well as ocular (o) and cervical (c) vestibular evoked myogenic potentials (VEMPs) and pure tone audiometry (PTA). A tilt of the SVV is a sensitive sign of a graviceptive vestibular tone imbalance. SVV was assessed with the subject sitting in an upright position in front of a half-spherical dome with the head fixed on a chin rest (38). A mean deviation of >2.5 • from the true vertical was considered a pathological tilt of SVV.
The impairment of the vestibulo-ocular reflex (VOR) in higher frequencies was measured by HIT (39) using highframe-rate VOG with EyeSeeCam [(40), EyeSeeTech, Munich, Germany]. A median gain during head impulses <0.6 (eye velocity in • /s divided by head velocity in • /s) was considered a pathological VOR (41). Furthermore, canal responsiveness in lower frequencies was assessed by caloric testing with VOG, which was performed for both ears with 30 • C cold and 44 • C warm water. Vestibular paresis was defined as >25% asymmetry between the right-and left-sided responses (42). The caloric asymmetry index (AI C ) was calculated based on the slow-phase velocity of the caloric nystagmus: Vestibular evoked myogenic potentials (VEMPs) are shortlatency, mainly otolith-driven vestibular reflexes elicited by airconducted sound (ACS), or bone-conducted vibration (BCV) and recorded from the inferior oblique eye muscle (ocular or oVEMPs) or the sternocleidomastoid muscle (cervical or cVEMPs). VEMPs were recorded with the Eclipse platform (Interacoustics, Middelfart, Denmark), as described previously (43,44). Only those VEMP responses that were clearly discernible from background noise were included in the analysis. To avoid bias due to examiners, only the asymmetry index (AI o/cV ) of VEMP amplitudes and latencies was analyzed in detail (45).

Data Acquisition
Four hours after intravenous injection of a standard dose (0.1-0.2 mmol/kg body weight, i.e., 0.1 −0.1 mmol/kg body weight) of Gadobutrol (Gadovist R , Bayer, Leverkusen, Germany), MR imaging (MRI) data were acquired in a whole-body 3 Tesla MRI scanner (Magnetom Skyra, Siemens Healthcare, Erlangen, Germany) with a 20-channel head coil. We used a 3D-FLAIR sequence to differentiate endolymph from perilymph and bone, and a CISS sequence to delineate the total inner ear fluid space from the surrounding bone. The T2-weighted, threedimensional, fluid-attenuated inversion recovery sequence (3D-FLAIR) had the following parameters: TR 6,000 ms, TE 134 ms, TI 2,240 ms, FA 180 • , FOV 160 × 160 mm 2 , 36 slices, base resolution 320, averages 1, acceleration factor of 2 using a parallel imaging technique with a generalized auto-calibrating partially parallel acquisition (GRAPPA) algorithm, slice thickness 0.5 mm, acquisition time 15:08 min. The high-resolution, strongly T2weighted, 3D constructive interference steady state (CISS) sequence of the temporal bones was performed to evaluate the anatomy of the whole-fluid-filled labyrinthine spaces and had the following parameters: TR 1,000 ms, TE 133 ms, FA 100 • , FOV 192 × 192 mm 2 , 56 slices, base resolution 384, averages 4, acceleration factor of 2 using GRAPPA algorithm, slice thickness of 0.5 mm and acquisition time 8:36 min. The presence of ELH was observed on the 3D-FLAIR images as enlarged negativesignal spaces inside the labyrinth, according to a previously reported method (18,46).

Signal Quality Assessment
Signal quality was validated using signal-to-noise ratio (SNR) and signal-homogeneity (SH) in different regions of interest (ROIs). ROIs were labeled in the left and right inner ear within the "endolymph" and "perilymph" fluid, "cochlear basal turn, " as well as in the surrounding tissue or subject matter, such as the "petrous bone, " "cerebellum, " "medulla, " and "air." In detail, the endolymph ROI consisted of 0.6 mm 2 circular 2D-selections of the left/right utricle. The perilymph ROIs consisted of multiple 0.6 mm 2 circular 2-D selections in the perilymphatic space (PLS) on both sides and were spread within the inner ear to obtain a signal intensity map. Said selections were placed in the vestibulum, twice inside the basal cochlea turn, the apex cochleae, the horizontal semicircular canal (hSCC) as well as the posterior SCC (pSCC). ROIs in the surrounding tissue or subject matter ("petrous bone, " "cerebellum, " "medulla oblongata, " and "air") consisted of 60.8 mm 2 circular selections. Signal intensity extraction (mean, minimum, and maximum) was performed on axial slices of the FLAIR raw images via the "Analyze Regions" plugin of the "MorphoLibJ toolbox" (47) within ImageJ (48).
SNR was calculated in each ROI as SNR (ROI) = S(ROI) std(air) , i.e., the fraction of mean signal intensity in an ROI S(ROI), and the standard deviation (STD) of the region labeled "air, " std(air). The label "air" was defined as "MRI signal measure of background variations in the signal devoid of fluid." In other words, a region's SNR was calculated as a mean signal relative to the extent of the background variation.
The signal's statistical homogeneity was examined between ROIs for each group, and between groups for each ROI. SH was defined as the identical distribution of two samples except for shifts and scaling of the overall distribution. The median of each sample was removed and the interquartile range was scaled to the value of one. The two samples were then compared using the minimum statistical energy [minEn; (49)] and the maximum mean discrepancy [MMD; (50)], whilst adding 10,000 permutations with a threshold of maximally one failed test to reach statistical significance. Consequently, two samples were deemed to have different distributions if they diverged in shape, either due to kurtosis, skewness or the extent, and number of outliers. Note that no correction for multiple testing was applied in these tests in order to be more sensitive toward violations of SH, i.e., significant differences.

Semi-quantitative Grading of the Endolymphatic Space
Semi-quantitative (SQ) grading of the endolymphatic space (ELS) was performed independently by three experienced head and neck radiologists or neurologists (BE-W, VK, and JG) who were blinded to the clinical patient data. Rater statistical homogeneity was calculated just as the signal's statistical homogeneity. The ELS's characterization in the vestibulum and cochlea was based on criteria previously described (22) and can be viewed in Table 1 and is described in further detail in Figures 1A,B, grade 0-3.
The characterization describes a 4-point grading for the cochlear and vestibular ELH. The cochlear grading is done on the midmodiolar level (19) and the vestibular grading on the inferior FIGURE 1 | Semi-quantitative (SQ) grading used for the endolymphatic hydrops. The vertical columns show the different semi-quantitative (SQ) grades from 0 to 3 used [cf. Table 1; according to a classification first described in Kirsch et al. (9)]. The horizontal rows each give an overview of how each grade looks (A), in FLAIR raw data (B), after VOLT processing (C), as used for 2D quantification (D), and as used for 3D quantification (E). A detailed description of each grade is given in the second paragraph of subsection 'Semi-quantitative Grading of the Endolymphatic Space'. part of the vestibulum, where the left semicircular canal (L-SCC) is still visible (19). The cochlear grading can be thought of as a fusion of previously described grading suggestions (21)(22)(23)(24). Grade 0 (no vestibular ELH) can be reduced to "X-mas tree built from circles that are divided by "very thin, clear, hypointense lines" [cp. also Figure 1 of (23), Figure 1A in (21)] that represent the non-enhanced ELS (scala media) between the enhanced PLS (scala vestibuli and tympani). Grade 1 (mild cochlear ELH) can be reduced to "X-mas tree with lights, " where the ELS is slightly enlarged and indirectly visible as a nodular black cut out of the scala vestibuli [cp. further Figure 2 in (24), Figure 1B in (21)]. Grade 2 (marked cochlear ELH) can be reduced to "X-mas tree with X-mas balls, " where the ELS is bulging into the scala tympani whilst giving the PLS a semicircular appearance [cp. Figure 3 in (24)]. Grade 3 (severe cochlear ELH) can be reduced to "X-mas tree with garlands, " where the severely distended ELS is causes a flattened appearance of the PLS [cp. also Figure 4 in (24), Figure  5A in (21), Figure 1C in (20)]. The vestibular grading is a fusion of previously described grading suggestions (21)(22)(23). Grade 0 (no vestibular ELH) can be reduced to "sacculus<utriculus, " where the otolith organs are distinguishable and the sacculus is smaller than the utriculus [cp. also Figure 2A in (21), Figure  6 in (23)]. Grade 1 (mild vestibular ELH) can be reduced to "sacculus≥utriculus, " where the sacculus is as large or larger than the utriculus [cp. also Figure 2B in (21), Figure 9 in (23)]. Grade 2 (marked vestibular ELH) can be reduced to "sacculus and utriculus are confluent, " where the otoliths organs are no longer distinguishable with a surrounding PLS rim [cp. also Figure 2C in (21), Figure 7 in (23)]. Grade 3 (severe vestibular ELH) can be reduced to "otolith organs not distinguishable" with no PLS visible [cp. also Figure 2D in (21), Figure 8 in (23)].

2D-and 3D-Quantification of the Endolymphatic Space
Segmentation of the total fluid space (TFS) was based on a recently proposed (Ahmadi et al., under review) and pretrained volumetric deep convolutional neural network (CNN) with V-net architecture (51) that was deployed via the TOMAAT module (52) in 3D-Slicer toolbox [version 4.11 (53)]. ELS and PLS were differentiated within the TFS using Volumetric Local Thresholding [VOLT; (31)] using ImageJ Fiji (48) with the "Fuzzy and artificial neural networks image processing toolbox" (54) and the "MorphoLibJ Toolbox" (47).
The resulting 3D volume can be regarded as a probabilistic map of the inner ear, which includes the classification into its two different compartments (ELS and PLS). The final classification strongly depends on the chosen cutoff. Based on empirical observations (31), 2D-and 3D-quantifications were examined at three cutoff variations (c6, c8, and c10). Each cutoff matches a percentage of positive classifications. For example, cutoff 6 (c6) corresponds to 79.2%, cutoff 8 (c8) to 70.8%, and cutoff 10 (c10) to 62.5% classifications into endolymphatic space. Examples of the pipeline outputs can be viewed in Figure 1C.
2D-quantification was done on axial slices of the VOLT volume. The mid-modiolar level was chosen for the cochlea and the inferior part of the vestibulum where the lateral semicircular canal (L-SCC) is still visible was selected for the vestibulum. However, the majority of volumes allowed for both a cochlear and vestibular measurement on the same slice. Easier visual selection was enabled by a look-up-table (LUT, "phase") included in ImageJ that was applied to the VOLT volumes. An example can be seen in Figure 1D. Areas were then measured using the "Analyze Regions" plugin which is part of the "MorpholibJ Toolbox" (47).
3D-quantification was done on the VOLT volume that included the entire inner ear. The cochlear volume was cropped using a cylindrical volumetric selection and applied to the VOLT volumes. The volume of the vestibulum including otolith organs and semicircular canals arose from subtracting the cochlear volume from the inner ear VOLT volume. Measurements were performed using the "Analyze Regions (3D)" plugin of the "MorpholibJ Toolbox" (47). A visualization can be seen in Figure 1E.

Parameters Derived From Endolymphatic Space Measures
The ELS ratio, ER [%] = ELS TFS × 100, was calculated for 2D-and 3D-quantification of the ELS analogous to the area ratio (AR) in previous classification conventions (19)(20)(21). ER indicates the relative size of the ELH to the TFS and as such is independent of the absolute size which might differ between subjects (for example due to body size).
ELS symmetry between both inner ears was assessed via the ratio of ELS side differences Diff − ER [%] = ERi − ERc, where ERi and ERcare the respective ipsilateral and contralateral ELS ratios in percent relative to the TFS. Another parameter was the asymmetry index, AI [%] = (ELSi ELSc) (ELSi +ELSc) × 100, where ELS i is the semi-, 2D-or 3D-quantification of the ipsilateral ELS and ELS c of the contralateral ELS. The asymmetry index can be interpreted as a normalized difference and as such is also independent of the individual TFS.
Areas and volumes were normalized according to their TFS, if c/v/a TFS c8 e2D/3D > c/v/a TFS c8 mean 2D/3D + 2.5 × std (TFS), where "e" is the individual value and "mean" is the mean of the respective group (HC or MD). For an overview of TFS, see Figure 6.

Statistics and Validation Parameters
All statistics were implemented with self-written scripts in MATLAB version 7.19.0 (R2019b) using the "Statistics and Machine Learning" toolbox provided with MATLAB (Natick, Massachusetts: the MathWorks Inc.). ELS quantification measures were validated and compared using parameters describing different characteristics on different levels (i.e., between groups, ELS analysis methods, and diagnostic methods) and between different entities (i.e., inter-rater, inter-threshold, and inter-ROI). Parameters considered the ordering of subjects between samples (concordance), Spearman correlations between samples (rank-correlation), the form of the distribution of samples via "minimum statistical energy" and "maximum mean discrepancy" (statistical homogeneity), and covariance between samples via ANCOVA (analysis of covariance). All statistical tests used multiple comparison correction, if multiple tests (e.g., more than two regions or two thresholds) were compared independently with each other. The FWE level was set at p = 0.05/N with N being the number of tests (e.g., regions, thresholds), i.e., Bonferroni correction.

Influence of Gd Dosage, Gd Time Delay on SNR and via SNR on SQ Grading, 2D-or 3D-Quantitatification
The influence of Gd dosage and time delay (from Gd injection to MR measurement) on the SNR and signal intensity (SI), as well as SNR, Gd dosage and time delay on SQ grading, 2Dor 3D-quantification measures was evaluated using ANCOVA modeling. The model included interaction of the group with each individual variable as well as the interaction of group, dosage and time delay variables. Additionally, covariates of no interest, such as age and BMI, were included. In other words, we checked whether SNR, Gd dosage, and Gd time delay each had an influence on the ELH measure in question, as well as the interaction of Gd dosage and Gd time delay, allowing for the possibility that the relationship might be different for each group.

Interrelations Between SQ Gradings and 2D-or 3D Quantification Statistical Homogeneity
Statistical homogeneity between SQ grading, and 2D-or 3D-quantification methods between groups was, in principle, calculated in the same way as the signal statistical homogeneity (cf. signal quality validation). First, the median of each group was removed and the interquartile range was scaled to the value of one. The two groups were then compared using minEn and MMD test statistics whilst using 10,000 permutations between groups. Any instance of a random permutation with a higher teststatistic than the unpermuted groups was considered a failure. The groups were deemed statistically homogeneous if at most one test failed, otherwise the groups were deemed inhomogeneous, as they could be distinguished based solely on their distribution shape (kurtosis and skewness or the extent and number of outliers). Note that no correction for multiple comparisons was performed here in order to be more sensitive to violations of SH, i.e., significant differences.

Rater Repeatability and Reliability
Repeatability and reliability of the three different raters for SQ grading, as well as of the three different thresholds (c6, c8, and c10) for 2D-or 3D-quantification were measured using rankbased correlations and Kendall's W measure for concordance (55). This assessment shows whether the ordering of subjects between raters is similar and therefore can be assumed to be repeatable over the raters. Furthermore, we compared ratings by subtracting the SQ grading scores between raters to see if the extent of differences in rating values differed. Correction for multiple comparisons, i.e., multiple tests was done over data types (SQ, 2D and 3D), therefore p (FWE) = 0.05 was set to p = 0.05/N with N = 3 for the three data types.
Significant rank correlations indicated that the ordering of subjects was very similar or concordant across these measures. Rank-correlation was used so that linear as well as non-linear relationships could be examined and the gradings (ordinal measures) could be related to the quantitative measures. Correction for multiple comparisons p (FWE) = 0.05/N, i.e., Bonferroni correction, was done over all pairs of correlations in each correlation matrix, i.e., for SQ-×-2D quantification and SQ-×-3D quantification N = 12-×12 = 144 and for the correlation of asymmetry indices N = 6-×6 = 36.

Influence of Thresholds on Quantitative Measures
The influence of VOLT thresholds (c6, c8, and c10) on group differences was assessed using general linear model (GLM) based two-sample t-tests (including age as a covariate of no interest). The resulting slopes for the effects of thresholds and their standard errors were used to calculate t-statistic values for each group comparison at each threshold. Furthermore, a slope difference test (56) was used to check if group differences depended on the VOLT thresholds. A slope difference test compares differences in slopes with standard errors for the group differences across thresholds to determine if group differences depended on the cutoff-threshold. Correction for multiple comparisons p (FWE) = 0.05/N, i.e., Bonferroni correction, was done for three tests of between threshold comparisons resulting from three thresholds (c6,c8,c10), i.e., c6-vs-c8, c6-vs-c10, and c8-vs-c10, and therefore N = 3.

Covariance of Clinical Measures and iMRI
Clinical (e.g., disease duration, number of attacks) and diagnostic measures (e.g., HIT, calorics, and VEMPs), as well as parameters derived from ELS measures in SQ gradings and 2D-and 3-D quantifications (ER, Diff-ER, and AI) were included in an analysis of covariance (ANCOVA). An overview of clinical symptoms and diagnostic measures can be viewed in Table 2. Furthermore, the analysis accounted for categorical variables, such as symptoms like headache, and continuous covariates, such as body mass index (BMI) and the age of the patients. For detection of diverging trends between MD or HC, parameters derived from ELS measures were allowed interactions with the group. That means each group was allowed to have a different trend in the model. We used Bonferroni-correction for the posthoc assessment of the individual factors in the ANCOVA.

Influence of ELH Presence on SNR and SI
The influence of the presence and extent of ELH on SNR and SI was examined with two approaches. First, SNR and SI data were investigated using classifications derived from SQ grades and 3D-quantification measures. The SQ grades were used to distinguish between "no ELH" and "ELH present, " while the 3D-quantification was used to distinguish between "low/small ELH" and "high/large ELH." For the classification using SQ grades, all grades equal to zero (SQ grade == 0) were allocated to "no ELH" and the rest to "definite ELH present." For the classification using 3D-quantification, data values below the median were in the "low ELH" class and data values above the  Frontiers in Neurology | www.frontiersin.org median in the "high ELH" class. The SNR and SI data were analyzed using two-sample t-tests and Wilcoxon rank-sum tests for differences from these classifications. The two tests (i.e., a parametric and non-parametric test), were used to ensure that any of the significant differences found were not purely dependent on the assumed distribution. Correction for multiple comparisons p (FWE) = 0.05/N, i.e., Bonferroni correction, was done for five tests between regions (split by ELH) comparisons, i.e., N = 5 (see Figures 5A,B). Then, the inverse question was asked. This time SQ and 3Dquantification values were compared following SI or SNR value classification and then analyzed accordingly for differences with two-sample t-tests and Wilcoxon rank-sum tests. For both SI and SNR classification, "low SI or SNR class" was defined by their values below the respective median, and "high SI or SNR class" by their values above the respective median. Correction for multiple comparisons p (FWE) = 0.05/N, i.e., Bonferroni correction, was done separately for the test between SNR (split by ELH 3Dquantification) and 3D-quantification (split by SI). The number of tests for the SNR comparison was N = 2, and the number of tests for ELH 3D-quantification was N = 4 (see Figures 5C,D).

Descriptive Statistics
Seventy-five MD patients (35 females; aged 22-81 years, mean age 56.6 ± 14.9 years; 97% RH) and 33 HC participants (20 females; aged 20-84 years, mean age 42.1 ± 18.9 years; 94% RH) were included in the study. An overview of the most important clinical features in MD compared to HC can be seen in Table 2. An overview of the ELS grading for HC and MD can be viewed in Table 3.  Figure 2 for an overview of the minor influence of the iMRI acquisition parameters on SNR. • The mean SNR was significantly different between the MD and HC group (p < 0.05). SNR asymmetry between left and right ear was not significantly related to Gd dosage, Gd time delay, or Gd dosage × Gd time delay.

Influences of Signal Quality on ELS
• SQ gradings and 2D-or 3D-quantifications were not significantly related to Gd dosage, Gd time delay, Gd dosage × Gd time delay interaction, SI or SNR (FWE corrected, p ≤ 0.05). There were some simple significant relationships (p < 0.05 uncorrected) for the iMRI variables with Gd dosage and SNR, but all these relationships were small in effect size (around 0.5-5% omega squared).

Interrelations Between ELS Quantification Methods (ii)
• Inter-rater SQ gradings (R1-3) were statistically homogeneous, as were 2D-and 3D-quantification values including ipsi-and contralateral or cochlea and vestibulum. • Inter-rater SQ gradings (R1-3) were highly matched in the vestibular (v) and cochlear (c) part of the inner ear for all subjects (HC, MD) and slightly less for MD only. The results can be viewed in Table 4 (column SQ) and suggest a high but imperfect reproducibility due to remaining variability. • Inter-threshold (c6, c8, and c10) 2D-and 3D-quantification was highly concordant. These results can be viewed in Table 4 (column 2D and 3D) and indicate an almost perfect agreement over VOLT thresholds with a basically perfect reproducibility. • SQ grades correlated strongly with 2D-quantification values (range of correlation from 0.3 to 0.7) and 3D-quantification values (range of correlation from 0.3 to 0.7). The correlations of 2D-and 3D-quantification values with SQ grades was mainly driven by the MD group, due to the higher variability within the group, compared to HC group which did not vary much in grades or 2D-and 3D-quantification values (cp. Figure 3, plots on the left and in the middle). • 2D-and 3D-quantification correlated substantially (range of correlation from 0.3 to 0.8) for the total inner ear, cochlea, and vestibulum on both the ipsilateral and contralateral side. However, there were no significant correlations of the ipsilateral with the contralateral sides (cp. Figure 3, plots on the right). AI SQ (asymmetry-index of SQ quantification) correlated significantly (range of correlation from 0.3 to 0.7) with AI 2D and AI 3D (asymmetry-indices of 2D-and 3Dquantification) except for the cochlear AI in the 2D-and 3D-quantifications in the c6-cutoff (cAI c6 2D and cAI c6 3D , cp. Figure 3).
• Inter-rater SQ grading differences did not differ strongly between R1-3. Figure 4 shows the results in more detail. For the vestibular part, the percentage of ratings that agreed, i.e., showed zero differences, was 54.6% (R2-R1), 50% (R3-R1), and 67.6% (R3-R2), while the percentage of differences of maximally one grade apart was 85.2% (R2-R1), 90.7% (R3-R1), and 98.2% (R3-R2). For the cochlear part, the percentage of ratings that agreed was 53.7% (R2-R1), 53.7% (R3-R1), and 71.3% (R3-R2), while the percentage of differences of maximally one grade apart was 88.9% (R2-R1), 90.7% (R3-R1), and 97.2% (R3-R2). • Inter-threshold 2D-and 3D-quantification measures were statistically homogenous and showed group differences for each threshold. • Clinical variables correlated with symmetry parameters derived from SQ grading and 2D-or 3D-quantification values such as the asymmetry index (AI) or the plain ELH   However, the effect size was very small (4.2, 5.3, and 6.8% of explained variance, respectively). The SNR displayed here "SNR mean" is the mean of the SNRs calculated for each region of interest, i.e., the two inside the basal cochlear turn (CBT), the apex cochleae (AC), the horizontal semicircular canal (hSCC) as well as the posterior SCC (pSCC). For each region, the SNR is the mean signal divided by the standard deviation of the region labeled "air." The left and the right ear are averaged for each region, before regions are averaged to form "SNR mean". difference between ipsilateral and contralateral side for the inner ear, vestibulum, and cochlea. The AI worked for unnormalized data and its results were comparable to the normalized data, while the pure differences between ipsilateral and contralateral sides were only useful when the data was first normalized to the fraction ER [%] of the total fluid space (TFS, cf. legend on standard values). This indicates that relative proportions of both ears, and the relative size of ELH are most useful for predicting quantitative clinical data from iMRI measures. Fittingly, vestibular AI for the 3D-quantification data explained 35% of the variance of the number of attacks in the 3 months prior to the examination and another 16% of variance could be explained by the AI for the 3D-quantification data of the whole of the inner ear (vestibular and cochlear parts combined). A more detailed clinical study and discussion can be found in another work (57).

Influences on Signal Quality (iii)
• There were significant differences in SI due to the presence of ELH in the following ROIs: cochlear basal turn [p (t−test) FIGURE 3 | Significant correlations between endolymphatic hydrops (ELH) quantification methods (belonging to question ii). The top row shows the correlations between semi-quantitative (SQ) gradings (x-axis) and 2D-or 3D-quantification values (y-axis) for vestibulum (v) or cochlea (c), and for the ipsilateral (ipsi) or contralateral side (contra). In addition, SQ gradings are rater-specific (R1-R3), and 2D-or 3D-quantification values are cutoff-specific (c6, c8, and c10). The higher the significant results (p < 0.05 FWE-corrected) correlated, the more they are colored in yellow (thresholded to 0-0.8). Overall, SQ gradings and 2D-or 3D-quantification values correlated with the respective other method, although to a higher extent on the ipsilateral side (on the contralateral side in c10) and vestibular part of the inner ear. The bottom row shows the corresponding correlations between methods (SQ gradings and 2D-or 3D-quantification values) for the respective asymmetry-index (AI). Generally, higher ELH 3D-quantification values had higher SI values. However, due to a significant spread, SI could not distinguish the presence of ELH from the absence of ELH (tested by means of the split of SI values based on defining "absence of ELH" as an SQ grading equal to zero and "presence of ELH" as all grades higher than zero). Fittingly, the opposite approach (splitting ELH values by SI brightness) did not show significant differences. For an overview, please see Figure 5. The signal intensity SI was significantly different between the MD and HC group for both ROIs in the cochlear basal turn, but not for the cochlear apex, hSCC, or pSCC (p < 0.05 FWE). The group differences in iMRI variables between the MD and HC groups persisted after removing effects of Gd dosage, time delay, and SNR, indicating that iMRI assessment was not significantly affected by the differences in Gd dosage, time delay, and SNR in the present dataset. • SNR was not influenced significantly by the presence or absence of an ELH. Selecting SNR values for all SQ grades = 0 ("absence of ELH") and comparing them with the remaining SNR values (where SQ grades >0, "presence of ELH") led to two-sample t-test p = 0.99 and two-sample rank-sum test p = 0.94. Furthermore, comparing the SNR values for low ELH values (3D-quantification values below the median) with SNR values for high ELH values (3D-quantification values above the median) did not show any significant differences in SNR (two-sample t-test p = 0.31 and two-sample ranksum test p = 0.45). Analog to this, splitting ELH values due to low SNR values vs. high SNR values did not result in significant differences [p (t−test) = 0.66 and p (rank−sum) = 0.47 on the ipsilateral side and p (t−testtest) = 0.2 and p (rank−sum) = 0.16 for the contralateral side]. For an overview, see Figure 5.

Standard Values
• Areas and volumes were normalized according to their TFS (total fluid space/surface) and can be viewed in Figure 6. • Our calculations showed that the chosen threshold did not change the group differences between MD and HC. The grading-specific 2D-and 3D-quantification values, the TFS values and resulting ratios can be seen in Figures 6, 7. Furthermore, we show the relationship of 2D-and 3Dquantification for the vestibular and cochlear part broken down by SQ grades in Figure 8 and Table 5. While grades increase, one can observe that 2D-as well as 3Dquantification increased. • ELH 3D-quantification values (see also Figure 6): The medians (ipsilateral, contralateral) of the vestibular data were (15 mm 3 , 11 mm 3 ) for the MD group and (12 mm 3 , 12 mm 3 ) FIGURE 4 | Inter-rater and -threshold between ELH quantification methods (belonging to question ii). Shown are differences between the three raters. The differences between raters are shown as percentages of the total number of subjects rated. Most grades between raters agree (no difference; in blue), and the next largest difference was by 1 grade (in green), the remaining differences between raters were mostly 2 grades (in yellow) and rarely 3 grades (in red) apart.
for the vestibular healthy control (HC) group. The medians of the cochlear data were (5.4 mm 3 , 4.6 mm 3 ) for the MD group and (4.6 mm 3 , 4.7 mm 3 ) for the HC group.  and 2D-or 3D-quantifications were independent of signal intensity (SI) and signal-to-noise ratio (SNR), but they were found to be significantly related to Gd dosage and time delay themselves. (ii) The ELS quantification methods used were highly Here, "ELH absence" is defined as the average semi-quantitative (SQ) rating being zero and "ELH presence" is defined as the average rating being non-zero. Significant differences are indicated by black lines and p-values for the group differences evaluated with two-sample t-test or rank-sum test (**p < 0.001 and *p < 0.05). The third row (C) shows the mean SNR split by 3D-quantification values (below or above median; indicated by a downward-arrow or an upward-arrow), as well as the mean SNR split by ELH being absent ["-"] or present ["+"], where presence is defined as ELH being non-zero and absence as ELH rating being zero. The group differences were evaluated with the two-sample t-test and the rank-sum test; neither of these was significant. The fourth row (D) shows the 3D-quantification values split by mean SI values (below or above median; indicated by a downward-arrow or an upward-arrow), as well as the 3D-quantification values split by SNR values (below or above median; indicated by a downward-arrow or an upward-arrow). The group differences were evaluated with the two-sample t-test and the rank-sum test; neither of these was significant.
reproducible across raters (SQ gradings) or thresholds (2Dand 3D-quantification), although 3D-quantifications showed least variability in comparison to 2D-quantifications and SQ gradings. The relative proportions of both ears, and the relative size of ELH proved to be most useful for predicting quantitative clinical data from iMRI measures. (iii) ELH size significantly influenced SI but not SNR. In contrast, SI could not predict ELH size. In the following, results (i-iii) will be discussed.

Within a Specific Dosage and Time Delay Range ELS Quantification Methods Remain Independent of Signal Intensity (i)
The 3D fluid-attenuated inversion recovery (3D-FLAIR) imaging used has high sensitivity to low concentrations of Gd-based contrast agents (GBCA) in fluid compared with conventional T 1 -weighted imaging (58). In particular, the heavily T 2 -weighted 3D-FLAIR imaging with a long effective echo time is very sensitive to subtle T1 shortening and can detect low concentrations of GBCAs in the perilymphatic space after intravenous administration of a single dose of GBCA (18,59,60). In the tested Gd dosage and Gd time delay range (see above), at most weak influences on SNR and no influence on ELS quantification methods were found. It can therefore be assumed that, although Gd dosage and Gd time delay certainly have an influence on iMRI quality parameters, the sweet spot for ELH quantification by iMRI is within the range of the tested parameters. These results tie in well with earlier studies that showed strongest enhancement in 3D FLAIR sequences between 3 and 6 h Another feature of the good performance within the chosen ranges may be the homogeneous distribution of the contrast agent in the entire volume of the inner ear (66,67).
Further improvement of SNR and visualization in terms of rapid, morphological enhancement for analysis of the temporal and spatial distribution in the PLS of the inner ear can be achieved through careful selection of MR sequences (59,68), combination (69,70), and post-processing (14) of MR sequences, MR Gd complex (71), MR coil, and MR field strength (72).

Is There a Hierarchy Within ELS Quantification Methods? (ii)
In line with the only comparative methodological study of ELS quantification methods published to date in 11 participants (9 patients and 2 healthy controls) (26), SQ grading and 2Dor 3D-quantification methods were found to be reliable and useful for the diagnosis of endolymphatic hydrops. However, the degree of reliability based on comparisons between raters or thresholds increased from SQ grading to 2D-and again to 3D-quantification methods. The increase in repeatability corresponds to the decrease in dependency of human decision (visually > specific slice in 2D > whole volume in 3D) and increase of automatization and data points (semi-quantitative < area < volume).
Another aspect that makes relying solely on SQ grading tricky is the comparability of methods between different research groups, besides inter-rater disparities. SQ grading conventions (cf. Table 1) vary in grading resolution from three [in cochlea (19)(20)(21)23) and vestibulum (19,24,25)] to four steps [in cochlea (22,24,25) and vestibulum (21)(22)(23)]. Accordingly, not all ELH grade results in cochlea or vestibulum correspond to each other due to the usage of different conventions [as an example grade 1 in (19,20,24)], or not at all [as an example (73)]. Based on either manually drawn (28,74) regions of interest (ROIs) or a convolutional neural network (CNN) segmentation (32), 2D quantification methods already offer an increased comparability and variability of information. However, the comparability of the results remains limited by the slice selection for the calculation of the ratio and the differing slices emerging from slice planning or MRI setup (sequence type, slice thickness, slice resolution). Concerning these issues, 3D-quantification can be a solution (no slice selection, independent of slice planning) or at least an improvement (sequence type, slice thickness, slice resolution). In addition, more information (data points) enables better fitting of diagnostic and clinical parameters (75). Yet here, too, methodological variations affect reproducibility and availability of results. The critical points are the segmentation of the inner ear from the background [manually (29), via atlas (76, 77), or CNN (31); (Ahmadi et al., under review)] and the ELS and PLS from the TFS [manually (26), semi-automatic (29), automatic (31)], as well as the availability of the software solutions [commercial (26,28,29,78) vs. open source (31)]. The less human-dependent and the more automated, the more reproducible the method in most cases. Therefore, the usefulness of the available quantification methods depends on its intended application. While visual SQ grading is highly useful in a clinical setting, automated 3Dquantification seems most suitable for research.

ELH Extent Influences Signal Intensity in the Basal Cochlear Turn (iii)
Zhang et al. (90) investigated 19 MD patients following doubledose iMRI and found that the signal intensity ratio of the cochlear basal turns in the affected ear was significantly higher than in the unaffected ear and that there was a positive correlation between the signal intensity ratio of the cochlear basal turn and the grades of cochlear and vestibular hydrops in the affected ear. The SNR was assessed and calculated manually according to (91) using the signal in perilymph of both cochlear basal turns and noise in coplanar circular 50 mm 2 ROIs in the cerebellum. The interpretation of these findings was that increased permeability of the blood-labyrinth barrier (higher SNR) may play a role in the process of endolymphatic hydrops in MD.
The results of the current study suggest, however, a general pathophysiological effect tied to the extent of the ELH and not MD as a pathology, since higher ELH 3D-quantification values had higher signal intensity (SI) values in the cochlear basal turn, apex cochlea, and hSCC ROIs. Within MD, SI only in the cochlear basal turn was significantly higher on the ipsilateral side when compared to the contralateral side. The SI was generally different between the MD and HC groups, indicating an effect of ELH also on signal presentation. SNR differed between the MD and HC groups; however, the effect was small and the group differences in ELH were not significantly affected by SNR, indicating that the group differences are a persistent effect of the underlying condition and not related to the imaging settings that were used in the current study.

Normalization and Standardization of ELS Values
Clinical variables correlated better and more correctly with relative (AI) or normalized values [to the fraction ER [%] of the total fluid space]. This indicates that relative proportions of both ears, and the relative size of ELH are most useful for predicting quantitative clinical data from iMRI measures (57).

Recommendations for Future iMRI Studies
The following methodological recommendations for future studies can be derived from the present work and the current available literature: • MR setup: Improved hybrid of reversed image of positive endolymph signal and native image of positive perilymph signal (iHYDROPS-Mi2) (15) or 3D-real inversion recovery (3D-real IR) (28,96), highest possible MRI field strength (72), smallest possible isotropic voxel size, deep learning reconstruction denoising (14)  and are most promising in symmetry parameters, such as asymmetry-indices for un-normalized data and relative size ELS for normalized data.

Methodological Limitations and Outlook
There are methodological limitations in the current study that need to be considered in the interpretation of the data. First, despite the comparatively wide range of contrast agent dosage and delay time within this study, the results should be (to some degree) considered specific to the study's MR settings (MR sequence, MR contrast agent, intravenous application). Second, despite the extensive analyses within this study, it was not possible to try all, but only representations of the methods used in this study [SQ following Figure 1 and (22), 2D-and 3D-quantification using VOLT (31)]. Third, the study lacks histological confirmation of endolymphatic hydrops. However, the in-vivo acquisition of histological specimens in Menière's disease is currently not possible. Fourth, the size of the control group (n = 33) was small in comparison to the MD group (n = 105). However, due to findings of signal intensity in the dendate nucleus and globus pallidus on unenhanced T1-weighted MR images (97)(98)(99) that are still under investigation, measurements were restricted to inpatients of the Department of Neurology that underwent MRI with a contrast agent as part of their diagnostic workup and agreed to undergo iMRI sequences after 4 h.

CONCLUSION
The current comparative methodological study has shown that: (1) A Gd dosage of 0.1-0.2 mmol/kg after 4 h ± 30 min Gd time delay will provide sufficient SNR when using recommended MR sequences and contrast agents. (2) An agreed upon clinical SQ grading classification including a standardized level of evaluation reconstructed to anatomical fixpoints is needed to provide unambiguous comparability between labs. (3) 3D-quantification methods of the ELS using algorithm-based segmentation of the TFS and ELS seem to be best suited for research purposes. Correlations with clinical variables should include both ears and ELS values reported relative or normalized to size. (4) The presence of ELH increases signal intensity in the basal cochlear turn weakly, but cannot predict the presence of ELH.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.