Radiomics and machine learning applied to STIR sequence for prediction of quantitative parameters in facioscapulohumeral disease

Purpose Quantitative Muscle MRI (qMRI) is a valuable and non-invasive tool to assess disease involvement and progression in neuromuscular disorders being able to detect even subtle changes in muscle pathology. The aim of this study is to evaluate the feasibility of using a conventional short-tau inversion recovery (STIR) sequence to predict fat fraction (FF) and water T2 (wT2) in skeletal muscle introducing a radiomic workflow with standardized feature extraction combined with machine learning algorithms. Methods Twenty-five patients with facioscapulohumeral muscular dystrophy (FSHD) were scanned at calf level using conventional STIR sequence and qMRI techniques. We applied and compared three different radiomics workflows (WF1, WF2, WF3), combined with seven Machine Learning regression algorithms (linear, ridge and lasso regression, tree, random forest, k-nearest neighbor and support vector machine), on conventional STIR images to predict FF and wT2 for six calf muscles. Results The combination of WF3 and K-nearest neighbor resulted to be the best predictor model of qMRI parameters with a mean absolute error about ± 5 pp for FF and ± 1.8 ms for wT2. Conclusion This pilot study demonstrated the possibility to predict qMRI parameters in a cohort of FSHD subjects starting from conventional STIR sequence.


. Introduction
Muscle Magnetic Resonance Imaging (mMRI) has been increasingly used over the last years as a powerful diagnostic tool to evaluate disease involvement and progression in several neuromuscular disorders (1)(2)(3). mMRI is able to demonstrate selective patterns of damage distribution both in terms of fat replacement and muscular edema (4,5). Facioscapulohumeral muscular dystrophy (FSHD) is a genetic muscle disorders that causes a slowly progressive and asymmetric weakness of the facioscapulohumeral, abdominal, paraspinal, and lower leg muscles (6)(7)(8)(9) both in pediatric and adult patients. mMRI of FSHD has relied on acquisition of conventional sequences such as T1weighted (T1w) and short-tau inversion recovery (STIR) sequences that are able to foster the qualitative detection of anatomical changes in muscles size or shape, particularly related to fat replacement and muscle edema (or edema -like) (10,11), revealing a widespread involvement both in upper girdle and lower limbs (12,13). The use of mMRI enabled to propose a peculiar model for FSHD disease evolution, highlighting how patients undergo a muscle-selective involvement with an early hyperintense signal on STIR sequence related to edema/inflammation, followed by fatty replacement of single muscles, particularly evident on T1w images (14). Recently, the use of STIR signal intensity as a longitudinal marker of inflammation suppression in FSHD has been questioned because an incremental STIR signal has been reported in FSHD patients during the immunosuppressive treatment period (15). As per other neuromuscular diseases, semi-quantitative visual scales have been applied to support and improve the evaluation of morphological changes in muscles, e.g., Mercuri and Fischer scales (16,17). The recent development and implementation of quantitative MRI (qMRI) in the field of neuromuscular diseases allowed to go beyond the conventional and semi-quantitative approaches, being able to assess quantitative parameters (e.g., the percentage of fat replacement in the muscle, the so called fat fraction, FF), that have been correlated both with transcriptome signatures (DUX4 and PAX7 signatures) and with clinical tests (e.g., Ricci clinical severity score) (18). Therefore the development of qMRI techniques improved the non-invasive applicability of muscle imaging in the diagnostic process and follow-up of muscle disorders (19). Neither the clinical outcomes nor the conventional muscle MRI techniques, in fact, are deemed to be sensitive enough to track muscle changes in slowly progressing diseases (3). qMRI is considered a valuable tool to monitor even fine changes in neuromuscular disease evaluation and longitudinal progression over time because it delivers quantitative information such as muscles FF and the muscle water T2 (wT2) relaxation time which is an unspecific marker for disease activity because it is sensitive to the presence of leaky membranes, muscle fiber necrosis, edema, inflammation, or denervation (20). Dixon imaging and Multi-Echo T2 spin-echo sequences are the most commonly used qMRI methods to compute FF and wT2 (3). Up-to-date qMRI methods require custom-tailored sequences provided by vendors on the MRI scanner resulting in high-cost implementations. Recently, Image Biomarker Standardization Initiative (IBSI, https://ibsi. readthedocs.io/en/latest/) radiomics proved to be a powerful tool to extract quantitative information from MRI images, becoming a new asset in the diagnostic field (21). It can identify the main patterns of a disease through the mathematical extraction of pixels intensity and spatial interrelationships distributions. Radiomics quantifies textural information that, once dimensionally reduced (22,23), can be combined with machine learning (ML) algorithms to predict neuromuscular quantitative biomarkers such as FF and wT2 with a good predictive power (24). Standardized features extraction can also help to overcome possible limitations due to the presence of fat in the evaluation of wT2 biomarkers through exponential fitting. However, it is still unclear whether and how radiomics could be applied on conventional STIR images and combined with ML algorithms to predict FF and wT2. Moreover, it remains unexplored whether the predictive power of ML algorithms on conventional STIR images could be improved through the definition of new radiomic features as an alternative to the ones provided by commercial radiomic features extraction software (25).
STIR sequence is most likely available in all MRI centers and it has a very competitive acquisition time compared to qMRI sequences. In this study, we aim to investigate whether different radiomics and machine learning algorithms may be applied to conventional STIR sequence to predict quantitative parameters in skeletal muscle.
A single slice from the medial calf level of each FSHD patient was selected from the first echo images of MEGE because of the higher SNR than the other echoes. Each selected slice was automatically segmented (27) into six regions of interest (ROIs) for each calf muscle, i.e. Soleus (S), Medial and Lateral Gastrocnemius (MG, LG), Anterior Tibialis (TA), Extensor Digitorum Longus (ELD), Peroneus Longus (Pe). The ROIs were co-registered to the medial calf slice of MESE and STIR using the linear registration command 'flirt' of FSL software (28). A single trained operator with 3 years of experience manually corrected each ROIs after the automatic segmentation of MEGE images and after the coregistration on MESE and STIR images ( Figure 2).
For each subject and each muscle, radiomic features extraction and ML prediction were performed on the mid-calf slice of STIR image because it gives a representation of all calf muscles with a cross sectional area (CSA) wide enough to ensure the extraction of a robust pixel intensity distribution (29). Fifty six radiomics features were extracted averaging left and right side per each muscle. In particular, we extracted 25 first-order statistical-based features concerning voxels intensity distributions, e.g., CONVENTIONAL_mean, CONVENTIONAL_std, CONVENTIONAL_max, CONVENTIONAL_Q1, 26 secondorder statistical-based features highlighting voxels spatial relationship such as the gray level co-occurrence matrix (GLCM) features (e.g., GLCM_Correlation, GLCM_Entropy_log10) and the gray level zone length matrix (GLZLM) features (e.g. GLZLM_LZE, GLZLM_LGZE, GLZLM_HGZE), 5 shape related features concerning size and geometric properties (e.g. SHAPE_Volume(mL), SHAPE_Volume(vx)) (25). Finally, ground truth FF and wT2 values, which the ML predictions have been compared to, were calculated by Fatty Riot algorithm (30) and by EPG signal simulation (two-component model, both for water and fat) (31, 32) from mid-calf MEGE and MESE slice, respectively.

. . Dataset, dimensionality reduction, and machine learning algorithms
We compare the performance in predicting calf muscle FF and wT2 values introducing three different workflows. In particular, .
/fneur. . inspired by Felisaz et al. (24) work, the first workflow predicts FF and wT2 combining radiomics with LIFEx software (25), principal component analysis (PCA) (33) and ML regression models. The second method uses the same features extraction and ML models of the previous method but explores the use of a new dimensionality reduction technique (23) as an alternative to PCA to verify a possible improvement in the prediction of neuromuscular quantitative parameters. The third method relies neither on LIFEx features nor on any dimensionality reduction technique. In particular, two STIR-based features are defined as markers of muscle fat percentage and muscle inflammation. These two features are used as predictors in ML models to test whether there is an improvement in the predictive performance of FF and wT2.

. . Workflow
Features extraction was performed using the IBSI standardcompliant LIFEx software v.7.1.0 with the aim to extract shape related features, taking into account for size and geometric properties, first-order statistical-based features, concerning voxels intensity distributions and second-order statistical-based features highlighting voxels spatial relationship. In particular, a 2D extraction was performed on each ROI corresponding to the six calf muscles (left and right side were averaged). Therefore, we obtained six datasets associated with each calf muscle. On each dataset principal component analysis (PCA) (33) dimensional reduction was performed in order to obtain lower-dimensional data while preserving as much of the data variation as possible. Six principal components, which in our case retain about 90% of the explained variance, were identified and consequently each data point was projected onto them. For each muscle dataset we implemented the parametric linear (34), ridge (35) and Lasso (36) regression and the non-parametric KNN (37), SVM (38), tree (39), and RF (40) algorithms. A k-fold cross validation resampling approach with k = 5 was used on the associated PCA dimensionally reduced dataset. This procedure guarantees a more realistic performance evaluation of each machine learning model by fitting the same statistical model several times on randomly obtained subsets of approximately equal size.

. . Workflow
The starting point was the 2D extraction of texture features from the pre-processed STIR image as described in WF1.
To reduce the dimensionality of the dataset we have used the concept of information imbalance described in Glielmo et al. (23). More precisely, performing feature selection or dimensionality reduction in our case is the same task of finding the most suitable measure between data points since explicit features are available. This is because a particular choice of features naturally gives rise to a different distance function computed through the Euclidean norm (23). Therefore, we designed a feature selection algorithm by selecting the subset of features, which minimizes the information imbalance with respect to the two targets, the values of the neuromuscular biomarkers FF and wT2, separately. The definition of information imbalance used was its estimation on a dataset with N points (23): where A is the space consisting in the radiomic feature space and B is the space associated to FF or wT2 biomarkers, r B and r A represent the rank of each pair points in the space B and A, respectively, calculated according to the distance d B and d A , an euclidean norm defined in the relative space. Thus, information imbalance quantifies the relative information content of a distance measure with respect to another using the widespread idea of local neighborhoods. A low value of (A → B) means that the combination of certain features can predict a specific neuromuscular biomarker. Figure 3 shows for Soleum the minimum information imbalance (A → B) achievable with a specific subset of radiomics features for the two biomarkers wT2 and FF. For each muscle, we optimized the information imbalance with respect to target FF and wT2 separately and selected the subspace of radiomics features corresponding to the associated minimum . The obtained datasets for each muscle and each biomarker were used as input for machine learning algorithms. As in WF1 parametric and non-parametric algorithms were implemented using the resampling k-folds cross validation.

. . Workflow
We defined two STIR-based radiomic features to be used as an alternative to the conventional textural features of WF1 and WF2. We use these new features as the only covariates in the implementation of ML algorithms to test whether the prediction performance of ML models could be improved over those obtained by the previously described workflows. Firstly, we applied the same segmentation method of FSHD patients on the pre-processed STIR images of each healthy control (HC). In particular, six contiguous HCs slices of mid-calf region were segmented in order to ensure a robust pixel statistics of the grayscale intensity distributions. Then, two reference limits, Upper Limit (UL) and Lower Limit (LL), were defined as follows. Inspired by Dahlqvist et al. (41), UL was defined for each calf muscle through the extraction of a pixel-wise histogram of signal intensity distribution from all slices.
The six muscle-wise UL were set at the mean µ of the associated pixels-intensity distribution added to 2 standard deviation (S.D.) σ : with i indexing the six calf muscles. Due to non-uniform fat suppression of STIR sequence, LL was calculated as a representative value of fat signal intensity. Therefore, subcutaneous fat (average thickness at medial level of HCs was about 10.5 mm) was manually drawn in HCs slices to ensure the extraction of LL feature. In particular, from subcutaneous fat ROI of all slices the pixel-wise histogram of signal intensity distribution was extracted. Subsequently, the LL was set as the mode of the distribution. In this way, we could calculate a more realistic fat intensity representative value, limiting the contribution of blood vessels present in the subcutaneous fat, which tend to shift the mean value of the associated distribution toward greater value due to the hyperintesity STIR signal of the blood.
Moreover, the obtained LL and muscle-wise UL coefficients were set as the reference limits to quantify, for every FSHD patient, fat infiltration grade (FFG) and muscle edema grade (MEG) by expressing the number of pixels below LL and above UL as a percentage of the total pixels in each calf muscle. FFG and MEG were then used as covariates in ML models to predict FF and wT2, respectively. Particularly, muscle-wise FFG and MEG values were separately collected into datasets according to calf muscles and neuromuscular biomarker and used as input for machine learning algorithms.      As described in WF1, we implemented both parametric and non-parametric models using the k-folds cross validation as a resampling approach. WF3 brought the advantage of testing the prediction accuracy of neuromuscular biomarkers with two features that were easy to compute by means of a standalone Python routine, without going through commercial texture software and any dimensionality reduction techniques.

. . ML models performance evaluation
According to the aforementioned workflows, models performance estimation was performed calculating for each muscle and for each ML algorithm the mean absolute error (MAE): where N is the number of observations, y i is the target value,ȳ i the predicted value, index j is related to the different calf muscles and index i runs over the observations associated with each muscle. Furthermore, mean MAE (MAE) was defined as: where the index k runs over the k = 5 folds.
To measure the variability of volume and ground truths distribution we also calculated the coefficients of variation (CVs) defined as: where the index i runs over the muscles, σ i and µ i are the associated S.D. and mean of the distributions, respectively. Thus, CVs for volume and ground truth muscle-wise FF and wT2 quantify the variability range of ground truth values on which the ML models were tested. Moreover, we explored whether MAE prediction shows linear or monotonic dependency on CV values of muscle volume and ground truth parameters using Pearson (ρ P ) and Spearman (ρ S ) correlation coefficients.

. Results
In Tables 1-3 the FF MAE was reported for the three used workflows (WF1, WF2, and WF3) calculated for each muscle and from each ML algorithm. Similarly, in Tables 4-6 the MAE was reported for wT2. Boxplots in Figure 4 show the FF and wT2 MAE distribution per each muscle and workflow (WF 1, 2, and  3). The discrepancy between the ground truth values and ML predicted values are expressed in percentage points (pp) for FF and in milliseconds (ms) for wT2, respectively.
As inferred from boxplots in Figure 4, each workflow resulted in a mean FF and wT2 prediction performance of ± 20 pp and ± 6 ms (averaged values) for the anterior compartment muscles and of ± 15 pp and ± 6 ms for the posterior compartment, respectively.   Figure 6 reports the CV i for FF and wT2 for each calf muscle. Similarly, muscle volume CVs account for inter-subject muscle shape variability. Volume CVs are reported in Figure 7. The ground truth CVs range from 0.45 to 0.99 for FF and from 0.04 to 0.22 for wT2 whereas volume CVs range from 0.30 to 0.42 (Figures 6, 7).
. /fneur. .  Table 7 shows no significant correlation between KNN MAE and both CVs of ground truth and volume values. Thus, KNN prediction seemed to be independent from inter-subject muscle shape, i.e., CVs volume, and ground truth variability ranges, i.e., CVs of FF and WT2. Furthermore, the presence of linear and monotonic correlations was tested even between KNN MAE and the mean volume of muscles to examine KNN prediction dependency on different calf muscle size. For our cohort, the following mean volume values for calf muscles were: S ≈ 1743.

. . Discussion
In this study, we explored the possibility to predict fat fraction and water T2 of calf muscles in FSHD subjects starting from a conventional STIR sequence and applying three different workflows, which combine radiomics, dimensionality reduction methods and ML models. To the authors' knowledge, this is the first attempt to predict qMRI parameters from STIR imaging, whereas MRI radiomics features extraction from STIR images have already been exploited to classify disease groups or autoantibodies in patients with idiopathic inflammatory myopathies (IIMs) with ML (42). The three applied workflows resulted in a comparable mean prediction performance about ± 20 pp for FF and about ± 6 ms for wT2 with the exception of LR and KNN models. KNN, according to the obtained results, turned out to be the best predictor model both for FF and wT2. More specifically, the . /fneur. . algorithm-wise performance highlights the best prediction for the combination of KNN and WF3 for both FF (±5 pp) and wT2 (± 1.8 ms). The muscle-wise analysis of the prediction performance also demonstrates a KNN mean prediction performance with almost no dependency either on the dimension of the muscles and on inter-subject muscle shape. We investigate these hypotheses by calculating for each muscle the muscle mean volume and the volume CVs. Despite the difference both in mean muscle-wise volume values and in volume CVs, no significant Pearson and Spearman correlation were found with KNN MAE that was able to predict wT2 and FF with a mean error of approximately ± 1.8 ms and ± 5 pp, respectively. Furthermore, the combination of a small sample size and high CV of ground truth distributions may have negatively affected the ML training step and consequently compromised prediction performance. However, KNN parameters prediction seemed to have no dependency on CV of ground truth values used for training ML algorithms. In contrast to the good predictive power of KNN, we found the least performative model being LR combined with WF2. We surmise that LR + WF2 might be unable to detect the complex relationship between predictors and target variable as suggested by the wider error bars. The main limit of the current study is related to STIR sequence artifacts such as the lowsignal-intensity banding artifacts and high-signal-intensity areas without proper fat suppression (43) that eventually may affect the FF prediction. Nevertheless, we used this non-uniform fat signal component to identify image fat pixels, which were used to extract conventional radiomics features (WF1, WF2), and to define FFG feature (WF3). Conversely STIR imaging is particularly suitable for muscle edema pattern detection (41) which may be easily detected by radiomic features. Furthermore, this study focused on the prediction by all WFs of the mean value of FF and wT2. FSHD is an asymmetric muscular dystrophy, therefore a more in-depth predictivity analysis that also takes into account the laterality of ROIs could be a useful tool for an ever-improving prediction model. Moreover, to expand the applicability of the current results, we aim to conduct further studies enrolling larger cohorts of subjects with different muscular dystrophies and also exploring other skeletal muscle districts (e.g., paravertebral muscles).
In conclusion, our study showed that conventional STIR imaging can potentially be used to predict quantitative muscle MRI parameters by applying radiomics combined with ML models. In particular, the KNN algorithm combined with WF3 was the best predictor for both FF and wT2. The proposed radiomic workflows could contribute to a wider application of a relatively common imaging technique as STIR to rapidly estimate quantitative parameters of skeletal muscle, without the need to acquire long and complex advanced qMRI sequences.

Data availability statement
The data analyzed in this study is subject to the following licenses/restrictions. Requests to access these datasets should be directed to Direzione Scientifica, dirsci@mondino.it.

Ethics statement
The studies involving human participants were reviewed and approved by the Ethics Committee of Pavia and the Ethics Committee of Fondazione Policlinico Universitario A. Gemelli. The patients/participants provided their written informed consent to participate in this study.

Author contributions
GC and LB: conceptualization, methodology, software, and writing-review and editing. MP: data curation, supervision, and writing-review and editing. MM, ER, and GT: resources and writing-review and editing. NB: software and review. GM: data curation. XD and FS: resources, writing-review and editing, and supervision. AM and SF: validation, writing-review and editing, and supervision. AP: supervision, project administration, and writing-review and editing. All authors contributed to the article and approved the submitted version.