Quantitative Validation of a Visual Rating Scale for Defining High-Iron Putamen in Patients With Multiple System Atrophy

Objectives: To validate a visual rating scale reflecting sub-regional patterns of putaminal hypointensity in susceptibility-weighted imaging of patients with multiple system atrophy (MSA). Methods: Using a visual rating scale (from G0 to G3), 2 examiners independently rated putaminal hypointensities of 37 MSA patients and 21 control subjects. To investigate the correlation with the scales, R2* values and the volume of the entire putamen were measured. Results: MSA patients with parkinsonian variant had significantly higher scores than those with cerebellar variant. Visual rating scores in MSA were correlated with R2* values [General estimating equation (GEE), Wald chi-square = 25.89, corrected p < 0.001] and volume (Wald chi-square = 75.44, corrected p < 0.001). They correlated with UPDRS motor scores. Binary logistic regression analyses revealed that the visual rating scale was a significant predictor for discriminating MSA patients from controls [multivariate model adjusted for age and sex, odds ratio 52.722 (corrected p = 0.009)]. Pairwise comparison between areas under the curve (AUCs) revealed that the visual rating scale demonstrated higher accuracy than R2* values [difference between AUCs; univariate model = 0.247 (corrected p < 0.001); multivariate model = 0.186 (corrected p = 0.003)]. There were no significant differences in clinical characteristics between the high-iron group, defined as putamen with visual rating scale ≥ G2 and R2* values ≥ third quartile, and the remaining patients. Conclusion: The visual rating scale, which reflects quantitative iron content and atrophy of the putamen as well as motor severities, could be useful for the discrimination and evaluation of patients with MSA.

Objectives: To validate a visual rating scale reflecting sub-regional patterns of putaminal hypointensity in susceptibility-weighted imaging of patients with multiple system atrophy (MSA).
Methods: Using a visual rating scale (from G0 to G3), 2 examiners independently rated putaminal hypointensities of 37 MSA patients and 21 control subjects. To investigate the correlation with the scales, R2 * values and the volume of the entire putamen were measured.
Results: MSA patients with parkinsonian variant had significantly higher scores than those with cerebellar variant. Visual rating scores in MSA were correlated with R2 * values [General estimating equation (GEE), Wald chi-square = 25.89, corrected p < 0.001] and volume (Wald chi-square = 75.44, corrected p < 0.001). They correlated with UPDRS motor scores. Binary logistic regression analyses revealed that the visual rating scale was a significant predictor for discriminating MSA patients from controls [multivariate model adjusted for age and sex, odds ratio 52.722 (corrected p = 0.009)]. Pairwise comparison between areas under the curve (AUCs) revealed that the visual rating scale demonstrated higher accuracy than R2 * values [difference between AUCs; univariate model = 0.247 (corrected p < 0.001); multivariate model = 0.186 (corrected p = 0.003)]. There were no significant differences in clinical characteristics between the high-iron group, defined as putamen with visual rating scale ≥ G2 and R2 * values ≥ third quartile, and the remaining patients.
Among iron-sensitive sequences, susceptibility-weighted imaging (SWI) appears to be the most sensitive for depicting spatial information regarding iron deposition. Previous studies have demonstrated that sub-regional analysis yields higher diagnostic value than the measurement of total iron in the entire putamen. The posterior and inner putamen was the most valuable subregion in differentiating MSA-P from PD (5,6). On the other hand, visual grading without considering the pattern of spatial distribution failed to differentiate MSA-P from PD (6,7).
We previously developed a visual rating scale to assess subregional patterns of hypointensity in the putamen using SWI (SWI-PUT) by modifying the scale developed by Lee and Baik (2), Harder et al. (8), Wang et al. (9). This scale was found to be valuable for differentiating MSA-P from PD in daily clinical practice (2). However, no study has attempted to validate the scale using quantitative iron measurement in the entire putamen.
In the current study, we validated a SWI-PUT visual rating scale using quantitative measurement of R2 * values. To investigate the clinical correlates of high-iron putamen, we used both the visual rating scale for SWI-PUT and quantitative measurement of R2 * values to distinguish MSA patients with high-iron putamen.

Patients
Thirty-seven patients with probable MSA [24 parkinsonian variant (MSA-P), 13 cerebellar variant (MSA-C)] according to international consensus criteria (10), and 21 control subjects from Pusan National University Yangsan Hospital (Yangsan, South Korea) were included. Subjects with vascular lesions or motion artifacts on MRI were excluded. None of the control subjects had a history of head trauma, stroke, or any neurological or psychiatric illnesses. Individuals with spinocerebellar ataxia (SCA) 1, 2, 3, 6, 7, 17, or dentatorubral-pallidoluysian atrophy (DRPLA) were excluded. Disease severity of MSA patients was assessed using the Hoehn and Yahr (H&Y) stage, the motor part of the Unified Parkinson's Disease Rating Scale (UPDRS-III), and the Unified Multiple System Atrophy Rating Scale part II (UMSARS-II). To investigate associations between visual rating scales and contralateral parkinsonian features, we composed UPDRS hemi-scores (sum of unilateral rest tremor, action tremor, bradykinesia and rigidity in UPDRS-III), rigidity hem-scores (sum of upper and lower extremities scores) and bradykinesia hemi-scores (sum of finger tapping, hand movement, rapid alternating movement, and leg agility). This study was approved by the Institutional Review Board of Pusan National University Yangsan Hospital, in accordance with guidelines of the Helsinki Declaration. Written informed consent was obtained from all subjects.
R2 * maps were calculated using the regression of log signals of the eight multi-echo volumes using customized MATLAB tools (MathWorks Inc., Natick, MA, USA). R2 * maps were linearly registered to their T1-weighted images using FLIRT (FMRIB's linear image registration tool). Automatic segmentation of the putamen was performed based on T1-weighted images using FreeSurfer version 5.3 (http://surfer.nmr.mgh.harvard.edu/).
The signal intensity and distribution was assessed and scored, as shown in Figure 1 (2) and Supplementary Figure 1. For further clarification, the two consecutive axial slices with the largest area of the putamen containing the anterior commissure, the septum pellucidum, and the pulvinar of thalamus was included. If the grades of the two slices differed, the higher score was selected. In the absence of the lateral-to-medial gradient, linear hypointensity along the lateral border relative to background signal intensity was scored G1.
The visual rating scale for SWI-PUT (2) was evaluated separately by two raters (L.J.H and L.M.J). The inter-rater reliability for the visual rating scale was good (Cohen's kappa = 0.782, approximate SE = 0.046, approximate significance < 0.001). In cases of discrepancy between grades assigned by the two raters, the final grade used in analysis was decided by consensus between the two raters.

Statistical Analysis
Comparison of continuous variables demonstrating Gaussian distribution was performed using the independent t-test. The Mann-Whitney U-test was used for variables with non-Gaussian distribution. Categorical variables were compared using the chi-square test. Because putaminal parameters were measured repeatedly within each subject, we employed general estimating equation (GEE) to investigate associations between putaminal rating scales, volume and R2 * values. Spearman's correlation test was used to test associations between UPDRS subscores and contralateral visual rating scales.
Binary logistic regression analyses were performed to test whether MRI results may distinguish MSA patients from control subjects. We chose putamen showing worse values (higher visual rating scales or R2 * values, and lower volume) as the predictors of logistic regression analyses. In addition, the area under the receiver operating characteristic curves (AUCs) obtained from each binary logistic regression model were compared using a pairwise comparison. Statistical significance was defined as p < 0.05. Corrections for multiple testing were performed using the Bonferroni procedure. Statistical analyses were performed using SPSS version 18.0 (IBM Corporation, Armonk, NY, USA) and MedCalc version 18.6 (MedCalc Software, Ostend, Belgium).

RESULTS
Demographic information and imaging characteristics of the enrolled subjects are summarized in Table 1. MSA-P subjects were predominantly female (male:female = 7:17) compared with those of control subjects, however the difference did not reach statistical significance (12:9; Pearson's chi = 3.593, uncorrected p = 0.075). The UPDRS and UMSARS were significantly higher in patients with MSA-P than those with MSA-C. Compared with control subjects, MSA patients had higher visual rating scores regardless of right and left sides ( Table 1). When visual rating scores were divided into low (G0 or G1) and high (G2 or G3) grade, 22 MSA-P subjects had either unilateral or bilateral high grade (n = 12, unilateral; n = 10, bilateral), whereas 2 MSA-C subjects had high grade putamens in the unilateral side. Frequencies of higher or lower visual grade showed significant differences between MSA-P and C groups, regardless of image sides ( Table 1). However, there were no significant differences in low and high visual grades in both sides between MSA-C and control subjects ( Table 1).
In the GEE analyses using age, sex, and image side as covariate, MSA-total patients had higher R2 * values in the putamen than controls. However, there was no significant difference in R2 * values between MSA-C patients and control subjects ( Table 1) Table 1). In addition, putaminal volume showed significant negative correlations with contralateral UPDRS subscores, whereas R2 * values did not (Supplementary Table 1).
Notably, there were a considerable overlap of R2 * values between each visual rating scale (Figure 2). GEE tests adjusted for age, sex, and image side did not reveal significant differences in putaminal R2 * values between G0 and G1, and between G1 and G2. In contrast, putaminal volume showed significant differences between visual rating scales (Supplementary Table 2). In binary logistic regression analyses for discriminating MSA and control subjects, visual rating scores, as well as R2 * values and volumes, were a predictor of MSA patients, regardless of whether covariates included in the logistic regression models. However, statistical significance of R2 * values did not remain after Bonferroni correction for multiple testing ( Table 2). The AUCs of visual rating scores were significantly larger than those of R2 * values, whereas putaminal volume had comparable AUCs with the visual rating scale ( Table 2). Finally, we sorted out MSA patients with high-iron putamen, defined as visual rating scale ≥ grade 2, and R2 * values ≥ third quartile. The high-iron group exhibited significantly smaller putamen volumes (3492.63 ± 718.16) than those of the remaining patients (4398.16 ± 1014.82; GEE using age, sex, total intracranial volume and image side as covariate, Wald chi-square = 16.389, β = −738.590, corrected p < 0.001). However, there were no significant difference in age, sex ratio, disease duration, UMSARS, UPDRS, or H&Y stage between the high-iron group and the remaining patients (Supplementary Table 3).

DISCUSSION
In this study, the SWI-PUT visual rating scale was validated using quantitative MRI data. This simple method reliably reflected quantitative iron content in the entire putamen and the severity of parkinsonian motor deficits in the contralateral body part. Moreover, the different scores in the rating scale corresponded to different quantitative degrees of atrophy. The SWI-PUT visual rating scale enabled us to easily identify putaminal degeneration specific to MSA. However, we found substantial mismatch between visual rating scores and quantitative R2 * values. There were a wide range of R2 * values in the same SWI-PUT grade. Even at high grades, R2 * values overlapped with the range of control subjects. These findings can be attributed to the focal and uneven deposition of iron (2,11,12). In the opposite case, significant signal hypointensity throughout the putamen might be also found in the absence of the specific visual pattern. In the present study, SWI-PUT grade demonstrated better accuracy in discriminating MSA and control subjects than that by R2 * values ( Table 2). As mentioned above, consideration of distributional pattern of putaminal hypointensity would be helpful for identifying and classifying MSA patients.  High-iron putamen, defined using two complementary methods, indicate the MSA-P subtype (13,14). It has been consistently reported that iron-related degeneration in the putamen is more specific to MSA-P (1, 3). In the present study, only volume atrophy was associated with high-iron deposition in the putamen. Similarly, iron accumulation in the putamen increased in parallel with the extent of atrophy in a previous longitudinal study (3). Other clinical variables, including disease duration, were not correlated with high-iron putamen. Due to the unknown period of asymptomatic disease evolution (15), the actual duration of disease may be better reflected by morphological markers of tissue destruction than by symptom duration.
In the present study, putaminal rating scales, and volume showed significant correlations with severities of contralateral parkinsonian features. Previous postmortem studies have demonstrated that parkinsonian motor deficits result from selective neuronal loss and gliosis predominantly affecting striatum and substantia nigra (16,17). Similarly, neuroimaging studies using diffusion tensor imaging have revealed associations between putaminal mean diffusivity and clinical rating scales such as UPDRS and UMSARS (18)(19)(20). Our results suggest that visual rating scales may be useful for monitoring disease progression.
We acknowledge the limitations of the present study. First, we included a relatively small sample size of MSA patients and control subjects. Moreover, the control subjects were not age-and sex-matched with the MSA patients. Second, in the present study, only two raters were participated to measure the visual rating scales. Our results require a cross-validation of further studies with larger sample size. Finally, we did not enroll patients with degenerative parkinsonism aside from MSA. At this time, we are unable to conclude whether our SWI-PUT scales would distinguish MSA from other types of degenerative parkinsonism. However, a previous MRI study using a visual analog scale suggested that putaminal hypointensity and atrophy are also useful for discriminating MSA from PD or progressive supranuclear palsy (21).
In conclusion, a simple SWI-PUT visual rating scale can reflect quantitative iron content and atrophy in the entire putamen. We believe that this scale will be useful for discrimination and evaluation of MSA patients. Recently, iron chelation has been proposed to modify the disease course of MSA (21,22) and, as such, high-iron putamen may be a potential therapeutic target. Our complementary analysis of pattern recognition and quantitative measurement may be useful for patient selection in this regard.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the manuscript/Supplementary Files.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Boards of Pusan National University Yangsan Hospital. The patients/participants provided their written informed consent to participate in this study.