LesionQuant for Assessment of MRI in Multiple Sclerosis—A Promising Supplement to the Visual Scan Inspection

Background and Goals: Multiple sclerosis (MS) is a central nervous system inflammatory disease where magnetic resonance imaging (MRI) is an important tool for diagnosis and disease monitoring. Quantitative measurements of lesion volume, lesion count, distribution of lesions, and brain atrophy have a potentially significant value for evaluating disease progression. We hypothesize that utilizing software designed for evaluating MRI data in MS will provide more accurate and detailed analyses compared to the visual neuro-radiological evaluation. Methods: A group of 56 MS patients (mean age 35 years, 70% females and 96% relapsing-remitting MS) was examined with brain MRI one and 5 years after diagnosis. The T1 and FLAIR brain MRI sequences for all patients were analyzed using the LesionQuant (LQ) software. These data were compared with data from structured visual evaluations of the MRI scans performed by neuro-radiologists, including assessments of atrophy, and lesion count. The data from LQ were also compared with data from other validated research methods for brain segmentation, including assessments of whole brain volume and lesion volume. Correlations with clinical tests like the timed 25-foot walk test (T25FT) were performed to explore additional value of LQ analyses. Results: Lesion count assessments by LQ and by the neuro-radiologist were significantly correlated one year (cor = 0.92, p = 2.2 × 10−16) and 5 years (cor = 0.84, p = 2.7 × 10−16) after diagnosis. Analyzes of the intra- and interrater variability also correlated significantly (cor = 0.96, p < 0.001, cor = 0.97, p < 0.001). Significant positive correlation was found between lesion volume measured by LQ and by the software Cascade (cor = 0.7, p < 0.001. LQ detected a reduction in whole brain percentile >10 in 10 patients across the time-points, whereas the neuro-radiologist assessment identified six of these. The neuro-radiologist additionally identified five patients with increased atrophy in the follow-up period, all of them displayed decreasing low whole brain percentiles (median 11, range 8–28) in the LQ analysis. Significant positive correlation was identified between lesion volume measured by LQ and test performance on the T25FT both at 1 and 5 years after diagnosis. Conclusion: For the number of MS lesions at both time-points, we demonstrated strong correlations between the assessments done by LQ and the neuro-radiologist. Lesion volume evaluated with LQ correlated with T25FT performance. LQ-analyses classified more patients to have brain atrophy than the visual neuro-radiological evaluation. In conclusion, LQ seems like a promising supplement to the evaluation performed by neuro-radiologists, providing an automated tool for evaluating lesions in MS patients and also detecting early signs of atrophy in both a longitudinal and cross-sectional setting.

Background and Goals: Multiple sclerosis (MS) is a central nervous system inflammatory disease where magnetic resonance imaging (MRI) is an important tool for diagnosis and disease monitoring. Quantitative measurements of lesion volume, lesion count, distribution of lesions, and brain atrophy have a potentially significant value for evaluating disease progression. We hypothesize that utilizing software designed for evaluating MRI data in MS will provide more accurate and detailed analyses compared to the visual neuro-radiological evaluation.
Methods: A group of 56 MS patients (mean age 35 years, 70% females and 96% relapsing-remitting MS) was examined with brain MRI one and 5 years after diagnosis. The T1 and FLAIR brain MRI sequences for all patients were analyzed using the LesionQuant (LQ) software. These data were compared with data from structured visual evaluations of the MRI scans performed by neuro-radiologists, including assessments of atrophy, and lesion count. The data from LQ were also compared with data from other validated research methods for brain segmentation, including assessments of whole brain volume and lesion volume. Correlations with clinical tests like the timed 25-foot walk test (T25FT) were performed to explore additional value of LQ analyses.
Results: Lesion count assessments by LQ and by the neuro-radiologist were significantly correlated one year (cor = 0.92, p = 2.2 × 10 −16 ) and 5 years (cor = 0.84, p = 2.7 × 10 −16 ) after diagnosis. Analyzes of the intra-and interrater variability also correlated significantly (cor = 0.96, p < 0.001, cor = 0.97, p < 0.001). Significant positive correlation was found between lesion volume measured by LQ and by the software Cascade (cor = 0.7, p < 0.001. LQ detected a reduction in whole brain percentile >10 in 10 patients across the time-points, whereas the neuro-radiologist assessment identified six of these. The neuro-radiologist additionally identified five patients with increased atrophy in the follow-up period, all of them displayed decreasing low whole brain percentiles (median 11, range  in the LQ analysis. Significant positive correlation was identified between lesion volume measured by LQ and test performance on the T25FT both at 1 and 5 years after diagnosis.
Conclusion: For the number of MS lesions at both time-points, we demonstrated strong correlations between the assessments done by LQ and the neuro-radiologist. Lesion volume evaluated with LQ correlated with T25FT performance. LQ-analyses classified more patients to have brain atrophy than the visual neuro-radiological evaluation. In conclusion, LQ seems like a promising supplement to the evaluation performed by neuro-radiologists, providing an automated tool for evaluating lesions in MS patients and also detecting early signs of atrophy in both a longitudinal and cross-sectional setting.

INTRODUCTION
Multiple sclerosis (MS) is a chronic inflammatory, demyelinating disorder of the central nervous system. Most frequently MS is characterized by a relapsing remitting disease course (RRMS) that over the years often converts to a secondary progressive disease course (SPMS). MS leads to variable degrees of physical and cognitive impairment. About 10 percent of the patients experience a progressive disease course from disease onset [primary progressive MS (PPMS)] (1, 2). A key challenge in MS care is to identify and develop prognostic biomarkers for the disease course (3). Magnetic resonance imaging (MRI) is still the most important tool for the diagnosis and monitoring of MS (4)(5)(6). MRI has a high sensitivity for the evaluation of inflammatory and neurodegenerative processes in the brain and spinal cord and it is the most commonly used method in the follow-up of MS patients (7).
Visual inspection of the MRI scans of people with MS is time consuming for the neuro-radiologist. Subjective measurements based on a radiologist's visual inspection may result in low reproducibility (8). In addition, the degrees of brain atrophy and MS-related pathology in the gray and white matter may be difficult to estimate, especially in early adulthood (9,10).
Advanced tools have been shown necessary to detect early brain atrophy in MS (11). New MRI post-processing tools that automatically analyse complex brain volumetric information and lesion load have recently become commercially available. A study comparing two different software types for assessment of longitudinal whole brain atrophy in MS patients found a strong level of statistical agreement and consistency between the two programs in a real-world MS population (12). The authors conclude that automated measurements of atrophy show promise as biomarkers of neuro-degeneration in clinical practice and will enable more rapid clinical translation. If these types of programmes are to be introduced, in addition to showing their performance compared to competing programs or established research tools, we need to evaluate their use compared to clinical practice today. Do they perform as well, or better than current practice, and in what way can they be useful and valuable?
Although a neuro-radiological evaluation of structural brain MRI in MS patients can easily estimate the pathologic burden of abnormalities such as T2 hyperintense lesions, limited correlation exists between these measures and the clinical phenotype (13). This has been termed the "clinico-radiological paradox" and is well-described for both physical and cognitive impairments. Some explanations to this paradox have been suggested, including inappropriate clinical rating, and underestimation of damage to the normal appearing brain tissue (14). In a large meta-analysis including 2,891 patients, Mollison et al. found a modest correlation (r = -0.30) between MRI measures of total brain white matter lesions and cognitive function in people with MS (13). LesionQuant(LQ) by CorTechs Labs is a software that automatically segments and measures volumes of brain structures and compares these volumes to norms based on the more established NeuroQuant(NQ) software (15). LQ was specifically designed for the evaluation of lesions and atrophy in MS patients. LQ also provides volumes and counts of new and enlarging brain lesions, and it automatically labels, visualizes and obtains the volumetric quantification of lesions based on brain MRI. LQ can therefore be used in the longitudinal follow-up of patients with MS. A recent study compared NQ to another software tool, Functional Magnetic Resonance Imaging of the Brain's (FMRIB's) Integrated Registration Segmentation Tool (FIRST), for estimating overall and regional brain volume in patients with clinically isolated syndrome (16). To our knowledge, no data is published that compares the LQ software in MS with visual evaluations performed by neuro-radiologists.
By using data from our prospective longitudinal study of newly diagnosed MS patients, results from longitudinal LQ analyses were compared to visual evaluations performed by neuro-radiologists in our hospital. We hypothesize that quantitative measurements of brain lesions and atrophy, using an unbiased automatic tool, may improve the correlation between clinical phenotype and MRI results.
Our aims were to evaluate (a) The performance of LQ at detecting brain lesions compared to a neuro-radiologist, and (b) brain atrophy as measured by LQ in comparison to the visual inspection by the neuro-radiologist. (c) The correlations of results from both visual assessment and LQ with clinically relevant variables and (d) the correlation between the segmented brain volumes and lesion volumes acquired from LQ and volumes from the two brain segmentation tools, FreeSurfer and Cascade.

Participants
All analyses were based upon carefully phenotyped MS patients in an ongoing prospective longitudinal MS study in Oslo (17,18). A total of 56 MS patients were included in this study, which had been examined on average 1 year after diagnosis. The inclusion criteria were a diagnosis of RRMS in the period 2009-2012 and age 18-50 years. The exclusion criteria were a history of psychiatric or other neurological diseases than MS, drug abuse, previous adverse Gadolinium reaction, pregnancy or breastfeeding at inclusion, or non-fluency in Norwegian. Data from two time-points after diagnosis of MS were included in this study; data from time-point 1 (TP1) was collected 13 months after diagnosis (±9, n = 56) and data from time-point 2 (TP2) after 66 months (±12, n = 56). At both time-points an expanded disability status scale (EDSS) score was calculated based on a standard neurological examination by trained clinicians (19,20). For assessment of walking ability and upper extremity function we also performed timed 25-foot walk test (T25FT) and 9-Hole Peg Test (9HPT). A brain MRI scan for clinical and research setting, was performed at both time-points.

MRI Acquisition
MS patients were scanned at both time-points with the same MRI scanning protocol in the same 1,5 T scanner (Avanto, Siemens Medical Solutions; Erlangen, Germany). The following two MRI sequences were required for the LQ analyses in this study; The neuro-radiologist could in addition use all available other sequences in the study protocol, as mentioned in previous publications (17,21).

LesionQuant Analyses of Lesion Count, Brain Volume, and Lesion Volume
The MRI data from the 56 patients were analyzed using the LQ tool (version 2.3.0), comparing data at TP1 and TP2. For each patient, T1-weighted and FLAIR sequences were uploaded to the LQ server from the PACS system in the hospital, without the need for pre-processing. Finalized LQ-reports were received after ∼10 min. The reports provided volumes and counts of all lesions, including new and enlarging lesions at TP2. A lesion was defined by LQ as being ≥ 4 mm 3 . Volumes of brain structures in the MS patients were compared with age and sex matched healthy controls from the LQ reference database. To establish the normative LQ database, CorTechs Labs combined data from several thousand scans including publicly available studies, studies by collaboration partners, and other proprietary data sources (age range 3-100 years, acquired using Siemens, GE and Philips MRI scanners with both 1.5T and 3T field strength).
The LQ-reports provided information about volumes of different brain structures, including whole brain, thalamus, cerebral white matter, volume of white matter lesions and cortical gray matter. The results for each patient are illustrated both using percentiles and absolute values (Figure 1). A cut off for atrophy was defined as a 10 percentile drop in brain volume for LQ between TP1 and TP2 (= 5-year interval).

MRI Evaluation of Lesion Count and Brain Volume Presented by the Neuro-Radiologist
The brain MRIs from the included MS patients were systematically evaluated by a neuro-radiologist who carefully counted all lesions on the MRI scans of the patients at TP1 and TP2. A lesion was defined as having a high T2/FLAIR signal ≥ 3 millimeters in diameter. To evaluate the intrareader variability the neuro-radiologist read the data twice in 10 of the patients. A second neuro-radiologist also calculated the number of lesions in the same 10 patients to give information about the interrater variability. In addition to this, another method for evaluating the number of lesions was added. Two neuro-radiologists estimated a lesion number score at TP1. If the number of lesions were between 0 and 9 the lesion number score given was 1, if the number of lesions were between 10 and 19 the lesion number score given was 2, if the number of lesions were more than 20 the lesion number score given was 3.
A neuro-radiologist also assessed whether increased whole brain atrophy was visually evident between the two time-points or not (Figure 2). Visual evaluation of atrophy was done using the 3D T1 series, where increased CSF in the sulci or on the surface of the brain or volume loss of the gyri between TP1 and TP2, was regarded as atrophy.
The exact lesion number, lesion number score, and the evaluation of atrophy were then compared to the output from LQ. The assessment by the neuro-radiologist was used as the gold standard to compare the LQ data with.

MRI Evaluation of Brain Volume and Lesion Volume Using Freesurfer and Cascade
To compare LQ with other research methods for brain segmentation we compared LQ with the softwares FreeSurfer and Cascade (22). To compare brain volume between the different softwares we used the measure for whole brain volume from both LQ (including brain stem) and FreeSurfer (excluding brain stem) (23). To compare the lesion volume between the different softwares we used the measure for total lesion volume from both LQ and Cascade.

Statistical Analysis
We used R (R Core Team, Vienna, 2018, version 3.6.1) for statistical analyses. To assess reliability of the whole brain volumes from LQ and FreeSurfer we computed the intraclass correlation coefficient (ICC) using the R package "irr" (24). Figures were made using "ggplot2" (25) and "cowplot" (26) in R. To evaluate the associations between the assessment provided by LQ, the neuro-radiologist, analysis using the FreeSurfer and Cascade softwares and the clinical data, we used the "stats" package in R and calculated the Pearson's correlation coefficient and applied the student's t-test (27).
To adjust for multiple comparisons, we calculated the degree of independence between the four clinical variables available, making a 4 × 4 correlation matrix based on the Pearson's correlation between all pair-wise combinations of clinical data. Utilizing the ratio of observed eigenvalue variance to its theoretical maximum, the estimated equivalent number of independent traits in our analyses was 3.0 (28). To control for multiple testing, our significance threshold was therefore adjusted accordingly from 0.05 to 0.017 (28).

Participant Demographics and Characteristics
At TP1 mean age of the study participants was 36 years (range 21-49 years), 70% were females and 96% were classified as having RRMS. EDSS was stable between TP1 and TP2 with median EDSS 2.0 (range 0-6). Time since MS diagnosis was on average 12.9 months (SD = 9.3) at TP1 and 66.0 months (SD = 11.7) at TP2. At TP1, 25% did not receive any DMT for MS, 63% received a group 1 DMT (moderately effective treatment) and 12% a group 2 DMT (highly effective treatment). At TP2, 34% did not receive any DMT, 36% received a group 1 DMT and 30% a group 2 DMT ( Table 1).

Cross-Sectional Comparison of Lesion Count Between the LesionQuant Reports and the Neuro-Radiological Evaluations
The lesion count assessments by LQ and the neuro-radiologist were significantly correlated at TP1 (cor = 0.92, p = 2.2 × 10 −16 ) and TP2 (cor = 0.84, p = 2.7 × 10 −16 ) (Supplementary Figure 2). The lesion counts were identical in only two and three patients at TP1 and TP2, respectively. While lesion counts were higher by the neuro-radiologist in 39 and 40 patients at TP1 and TP2, respectively. Lesion counts were lower by LQ in 15 and 13 patients at TP1 and TP2, respectively. In general, the differences in number of lesions evaluated by LQ and the neuro-radiologist increased with age. For patients with higher number of lesions the neuro-radiologist tended to count more lesions than LQ, and the opposite with lower number of lesions, see Supplementary Figure 1. To evaluate the intra and interrater variability the neuro-radiologist recounted the lesions in 10 of the patients, and a second neuro-radiologist counted lesions in the same 10 patients. Both the intra-and the inter-rater variability were significantly correlated (cor = 0.96, p < 0.001, cor = 0.97, p < 0.001). This was also the case for the lesion number scores estimated between LQ, and two different neuro-radiologists.
The lesion volume assessments by LQ and the Cascade software were significantly correlated (cor = 0.7, p < 0.001).

Longitudinal Evaluations of Atrophy and Lesions
We also compared the LQ software with the assessment by the neuro-radiologist in identifying whole brain atrophy at TP2. The neuro-radiologist classified 12 subjects to have brain atrophy. These 12 subjects also had significantly lower scores on whole brain atrophy by LQ (mean brain volume percentile 37.0, range 10-80), compared to the subjects that were not classified as having brain atrophy (mean brain volume percentile 48.7, range 2-99).
At TP2, LQ and the neuro-radiologist agreed in classifying 33% of the subjects with atrophy (four out of 12 subjects). In addition, the neuro-radiologist identified eight more subjects with brain atrophy (mean LQ whole brain percentile 31.3).
LQ detected a reduction in whole brain percentile >10 in 10 patients between TP1 and TP2, while the neuro-radiological evaluation identified six of these. The evaluation by the neuroradiologist identified an additional six patients with increased atrophy between TP1 and TP2, all of whom displayed low whole brain percentiles at TP2 (median 11, range 8-28) and decreasing percentile between the time-points.
At TP2 we found that LQ showed reduced whole brain volume in 51 patients compared to TP1 with a mean reduction in volume of 20.5 ml/1.59% (range 0.4-109.4 ml/0.03-8.08%) of the whole brain volume. In the remaining five patients we found an increased volume with a mean increase in volume of 6.8 ml/ 0.56% (range 0.2-17.4 ml/0.02-1.44%).
To evaluate the sensitivity of LQ in detecting new lesions, compared to the neuro-radiologist, the difference in number of lesions assessed at the two time-points was analyzed in a 2 × 2 table ( Table 2). The sensitivity of the LQ-analysis to correctly classify the patients according to the gold standard neuroradiological evaluation was 53% (17/32 patients). The specificity of the LQ-analysis to correctly evaluate the MRI follow-up as stable according to the neuro-radiological evaluation was 75% (18/24). In total, 43 % of the patients were evaluated with no new lesions on MRI at TP2 by the neuro-radiologist. Also, 57% (32 patients) had new lesions according to the neuro-radiologist, and only 17 of these had new lesions according to the LQ-reports (Table 2 and Figure 3).

Correlations Between MRI Features and Clinical Variables
We found significant positive correlations between T25FT and the lesion volume as measured by LQ at both TP1 (t = 3.08, p   Table 3). These results also indicate slower test performance by T25FT in patients with a higher burden of lesion volume. In addition, we found a significant positive correlation between the 9HPT using the left hand and lesion volume at TP2 (t = 5.34, p = 2.09 × 10 −6 ), indicating slower test performance with increased lesion volume. We also found a significant negative correlation with EDSS and whole brain volume at TP1 (t = −2.68, p = 9.8 × 10 −3 ), indicating higher EDSS scores with lower brain volumes. We found no significant correlations between the number of lesions reported by the neuro-radiologist and the clinical variables. All significant correlations reported were adjusted for multiple testing.

Reliability of LesionQuant Volumes
To validate the LQ data with the established FreeSurfer output for brain segmentation we compared the measure for whole brain volume from both LQ (including brainstem) and FreeSurfer (excluding brain stem), (Supplementary Figure 2). At both TP1 (t = 51.6, cor = 0.99) and TP2 (t = 45.2, cor = 0.99), Pearson's correlations were highly significant. We also validated regional volumes for thalamus using both raw FreeSurfer data and data processed through the longitudinal stream compared to the LQ data, yielding less significant correlations (t = 2.4-2.8 and, cor = 0.32-0.35, p = 0.02-0.008).

LesionQuant Reports and Neuro-Radiological Evaluation
All longitudinal LQ-data yielded excellent concurrence. To evaluate the consistency and agreement of the longitudinal LQreports, we measured the Intraclass correlation coefficients (ICC) between TP1 and TP2 for brain volume (ICC = 0.97, p = 2 ×  Table 1). We found significant correlations between lesion volume and the number of lesions at both TP1 (t = 6.32, p = 5.05 × 10 −8 ) and TP2 (t = 4.21, p = 9.77 × 10 −5 ). We found no significant changes in the parameters between TP1 and TP2 (Supplementary Table 1). As a sanity check, the ICC for lesion counts reported by the neuro-radiologist was very high (ICC = 0.99, p = 2.6 × 10 −50 ), as expected.

DISCUSSION
Magnetic resonance imaging is an important para-clinical tool for the diagnosis and monitoring of MS. Quantitative measurements of lesion volume and distribution of lesions have a significant value for evaluating disease progression in a clinical setting and brain atrophy is a possible new measurement to be used in future evaluation in MS patients. In this study we explored the use of the LQ software for evaluating cerebral MS lesions as well as brain atrophy in a clinical setting, and investigated if an automatic analysis of MRI scans using such software shows promise for use in the clinical follow-up of MS-patients. We found a high correlation between lesions counted by the neuro-radiologist at TP1 and TP2 and the lesion count output from LQ. In evaluation of atrophy between TP1 and TP2 there was only agreement between the neuro-radiologist and LQ in 50% of the patients (6 out twelve). Differences in whole brain percentiles between TP1 to TP2 were detected with LQ in the majority of patients, ranging between 0.03 and 8.08%. Lesion volume from LQ analysis correlated with outcome of clinical tests of walking speed and upper extremity function. Significant positive correlation was identified between lesion volume measured by LQ and test performance on the T25FT both at 1-5 years after diagnosis. There was a significant correlation between the results of LQ and the segmented volumes by FreeSurfer, showing high reliability of LQ output for whole brain volume. The correlation between lesion volume estimated by LQ and by the Cascade software were also highly significant. In order to evaluate treatment-effect, it is of importance to see if new or enlarging lesions appear on a follow-up MRI scan. The lesion counts of the LQ software and the neuroradiologist were highly correlated at both timepoints. However, visual assessment revealed somewhat higher lesion counts than the LQ assessment, more so in patients with a high number of lesions. The explanation of this difference in lesion number could be explained in differences in the definition of a lesion. As mentioned in the materials and method chapter a lesion was defined as having a T2/FLAIR signal ≥ 3 millimeters in diameter when analyzed by visual evaluation, but the lesion size set by LQ was ≥ 4 mm 3 . With the high correlation of lesion count overall, the LQ tool should be valuable for detecting lesions in routine follow-up MRI in MS. The resulting LQ report could then be controlled by a neuro-radiologist.
Regarding the detection of lesions, we used the assessment by the neuro-radiologist as the "gold standard." However, it is well-established that the detection of cortical lesions may be challenging using conventional brain MRI and these may be missed by radiologists (29). This is shown in a study comparing the number of MS lesions counted by radiologists and as analyzed by MSmetrix (today known as IcoBrain MS), a software comparable to LQ (30). This study showed a higher recountdifference when recounting was performed by radiologists than in MSmetrix (31). Therefore, the gold standard as we defined it in this paper, may be more variable than the automated software tool.
Reliable evaluation of atrophy is difficult with only visual inspection, although it is not a very time-consuming task. Results from studies comparing visual ratings of atrophy using GCA have shown Inter-rater reliability of > 0.6 and Intra-rater reliability of >0.7, which is considered moderate agreement (32). When the neuro-radiologist evaluated the MRI, images from TP2 was compared with the MRI scan at TP1 for each patient. In a clinical routine setting, the neuro-radiologists often compare to the previous MRI, which may be taken months or up to a year before. The changes in atrophy are rather small from year to year −0.2 to −0.3% per year in our patients' age range) (33) and it is not possible to detect such small changes in reduction of brain volume for the neuro-radiologist. A better approach may be to always compare the last scan to the first MRI in order to increase sensitivity of visual atrophy assessment. But even if there are several years between the MRI scans it could be difficult to estimate reduction in brain volume if the patients evaluated are young and stable. The discrepancy between the 12 patients found to have atrophy from visual inspection, to the 51 patients showing reduced brain volumes in the 5-year follow-up may indicate that LQ would be helpful in clinical practice. Never the less we should have in mind the risk of detecting reductions in whole brain volume with LQ which is not clinically relevant. Our MS population is young (mean age 36 years) and relatively stable (median EDSS 2,0). Finding small reductions in brain volume in such a population would not necessarily improve their general health condition. But overall, our results indicate that the automated method LQ performs better than the visual evaluation method in terms of atrophy evaluations, as discussed above.
Most of the MS patients were treated with moderately or highly efficacious disease modifying therapies at TP1 and TP2. In total, 10 MS patients changed to a more efficient MS treatment during the follow-up. We found no significant differences in brain volumes or change in brain volumes between the patients who increased treatment efficiency during the follow-up and those who either used the same treatment or reduced the efficacy of their MS treatment during the follow-up period. As a confounding factor we have to consider that switching to more efficacious treatments would impact the brain volumes by possible pseudoatrophy during the first 6 months (34). Although, during our observational period we did not find any significant differences. Other short term confounding factors affecting brain volume measurements are known, such as fluid restriction, the time of the day for MRI measurements, corticosteroids, antipsychotic treatment and short-term effects of pathological processes that decrease global brain volume (35). LQ compared differences in brain volume during approximately a 5-year period (2012/2013 and 2016/2017). During this period the patients were scanned on several occasions, which were not part of the study. One of the main benefits of using automated methods for image analysis in MS patients, is the possibility to perform more reliable and quick evaluation of brain atrophy. As shown by Pareto et al. when comparing two different tools for automated volume analysis of different brain regions, the size of the brain region of interest seems to be important (16). We found an excellent correlation between the FreeSurfer and LQ software's in the assessment of whole brain atrophy (cor = 0.99). However, for thalamus we only found modest correlations between both the raw and processed volumes estimated by LQ and FreeSurfer (cor = 0.32-0.35, p = 0.02-0.008, respectively), confirming the results of Pareto et al. In a recent paper, Storelli et al. also studied reproducibility and repeatability using different software's for atrophy measurements (10). They concluded that an improved reproducibility between scanners is required for clinical application.
In our study the LQ software estimated an unexpected increased whole brain volume in six patients between TP1 and TP2. This could be due to variability in the MRI scanner or other technical reasons. Alternatively, changes in lesion load in the patient over time may affect the atrophy measurements (10).
We hypothesized that improved measurements of brain lesions and atrophy, using an unbiased automatic tool, may improve the correlation between clinical phenotype and MRI results. We found that only the automated LQ software was able to show significant correlation with the standard clinical tests (T25FT, 9HPT, and EDSS). We consider this to be a robust and expected finding as only LQ and not the neuro-radiologist could provide volumetric information. In line with this, the 9HPT was positively correlated with lesion volume at time-point two; although only significant for the left hand, the same trend was seen for the right hand. The EDSS scale, which is the most widely used method to grade disability in MS, was only associated with whole brain volume at TP1 (t = −2.68, p = 9.8 × 10 −3 ). There were no correlations between lesion count, either from LQ or the neuro-radiologist and the EDSS, T25FT or 9HPT, also showing the value of having volumes of the lesions and whole brain available.
In general, we found very high levels of intraclass correlation coefficients (0.88-0.97), showing consistency and agreement among the longitudinal LQ-reports. A strength of this paper is the longitudinal design where the MS patients were examined clinically and with brain MRI both one and 5 years after diagnosis. The patient cohort is well-characterized by trained clinicians, performing the clinical and MRI assessments. The same MRI scanner and protocol was used for all patients at the two time-points of evaluation, and all patients were scanned in the afternoon/early evening. The neuro-radiological evaluation of the 56 patients at TP1 and TP2 was performed by the same neuro-radiologist, and in addition both inter and intrarater evaluations were performed. Thus, the quality of the data included in this study is suitable for addressing the research question. A weakness of the study is that we did not perform visual assessment by two independent raters for the visual evaluation of atrophy. Also, there was no control group.
The structured LQ report is acquired using fully automated MRI post-processing software, which requires only minimal effort and reduces bias of different raters, which is present when using visual inspection of images. Another advantage is the very short processing time of LQ compared to similar software used for research, with only about 10 min from the uploading of images to the final report is received. In comparison, software like FreeSurfer needs hours to process the data, cannot be interpreted for individual patients and is not feasible for clinical practice.

CONCLUSION
In conclusion, automatic analyses of MRI scans of MS patients may provide faster assessments than the traditional evaluation performed by the neuro-radiologist. LQ seems like a promising supplement to the evaluation by the neuro-radiologist, providing an automated tool for assessment of MS lesions and brain volume in MS patients.

DATA AVAILABILITY STATEMENT
The current dataset cannot be made publicly available for ethical reasons, and public availability would compromise patient confidentiality and participant privacy. The study was conducted in humans and the dataset includes sensitive and personal information on individuals. A portion of data can be made available upon request to interested, qualified researchers provided that an agreement is made up. The minimal data set will enable replication of the reported study findings.
Requests to access the datasets should be directed to Hanne F. Harbo, h.f.harbo@medisin.uio.no.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The South Eastern Regional Committee for Medical and Health Research Ethics. The patients/participants provided their written informed consent to participate in this study.