A Computed Tomography Radiomics-Based Prediction Model on Interstitial Lung Disease in Anti-MDA5-Positive Dermatomyositis

Objectives: Anti-melanoma differentiation-associated gene 5-positive dermatomyositis-associated interstitial lung disease (MDA5+ DM-ILD) is a life-threatening disease. The current study aimed to quantitatively assess the pulmonary high-resolution computed tomography (HRCT) images of MDA5+ DM-ILD by applying the radiomics approach and establish a multidimensional risk prediction model for the 6-month mortality. Methods: This retrospective study was conducted in 228 patients from two centers, namely, a derivation cohort and a longitudinal internal validation cohort in Renji Hospital, as well as an external validation cohort in Guangzhou. The derivation cohort was randomly divided into training and testing sets. The primary outcome was 6-month all-cause mortality since the time of admission. Baseline pulmonary HRCT images were quantitatively analyzed by radiomics approach, and a radiomic score (Rad-score) was generated. Clinical predictors selected by univariable Cox regression were further incorporated with the Rad-score, to enhance the prediction performance of the final model (Rad-score plus model). In parallel, an idiopathic pulmonary fibrosis (IPF)-based visual CT score and ILD-GAP score were calculated as comparators. Results: The Rad-score was significantly associated with the 6-month mortality, outperformed the traditional visual score and ILD-GAP score. The Rad-score plus model was successfully developed to predict the 6-month mortality, with C-index values of 0.88 [95% confidence interval (CI), 0.79–0.96] in the training set (n = 121), 0.88 (95%CI, 0.71–1.0) in the testing set (n = 31), 0.83 (95%CI, 0.68–0.98) in the internal validation cohort (n = 44), and 0.84 (95%CI, 0.64–1.0) in the external validation cohort (n = 32). Conclusions: The radiomic feature was an independent and reliable prognostic predictor for MDA5+ DM-ILD.


INTRODUCTION
Anti-melanoma differentiation-associated gene 5-positive dermatomyositis (MDA5 + DM) is a distinct subtype of dermatomyositis (DM) predominantly reported in East Asia, characterized by pathognomonic rashes, none or mild myositis, and, notably, rapidly progressive interstitial lung disease (ILD). Recent study had reported three subgroups of MDA5 + DM with different prognoses. Patients with rapidly progressive ILD had a very high mortality rate, compared to those with frequent signs of skin vasculopathy and with frequent arthralgia or arthritis (1). The overall prognosis of MDA5 + DM-ILD is poor with most fatalities occurring within the first half year since disease onset. A gravely 6-month mortality rate of around 50% was reported despite the intensive immunosuppressive therapy (2)(3)(4)(5).
As a mainstream imaging tool for identifying ILD, pulmonary high-resolution computed tomography (HRCT) is also crucial in the prognostic prediction. Moreover, semi-quantitative HRCT score has been applied as a prognostic prediction measurement in MDA5 + DM-ILD (6,7). However, currently available HRCT scoring methods are observer dependent, with considerable inter-observer bias and initially derived for idiopathic pulmonary fibrosis (IPF) (8)(9)(10). When referring to MDA5 + DM-ILD with a much more aggressive clinical course, the applicability of the scoring system is controversial (11). Therefore, a measurable and quantitative method to assess severity and predict prognosis of MDA5 + DM-ILD is urgently needed. The technological advance of "radiomics" has provided a promising solution for HRCT evaluation in a more comprehensive and objective perspective. The technological advance of "radiomics" is a conversion of digital medical images into mineable highdimensional data. It may provide a promising solution for HRCT evaluation in a more comprehensive and objective perspective. Since the radiographic images contain information that reflects underlying pathophysiology, Radiomics could improve predictive or prognostic accuracy potentially. In the area of oncology, radiomics has enabled the non-invasive profiling of tumor heterogeneity and direct estimation of prognosis by high-throughput extraction of quantitative descriptors from radiographic images (12,13). To the best of our knowledge, there was no radiomics study focusing on MDA5 + DM-ILD. Therefore, the current study aimed to quantitatively assess MDA5 + DM-ILD by applying the radiomics approach and further establish an applicable multidimensional risk prediction model by incorporating clinical factors for 6-month mortality to improve the prognosis of this disease.

Patients
The study population consisted of hospitalized patients from two centers: A derivation cohort from April 2014 to October 2019 and a longitudinal internal validation cohort from November 2019 to May 2020 were recruited in Renji Hospital (Shanghai). However, an external validation cohort from Guangzhou Institute of Respiratory Health was constructed during the same time period. Ethical approval was obtained for this retrospective analysis, and the need to obtain informed consent was waived for both institutions. Eligible patients for this study initially fulfilled Bohan and Peter's criteria for DM or Sontheimer's criteria for clinically amyopathic DM on admission (14,15) and were retrospectively confirmed by the 239th European NeuroMuscular Center (ENMC) classification criteria for DM (16). All patients presented with imaging-confirmed ILD and positive anti-MDA5 antibody. The anti-MDA5 antibody was detected by immunoblotting assay (Euroimmun, Germany) and confirmed by enzyme-linked immunosorbent assay (Supplementary Material). ILD course was defined as time from the first abnormal pulmonary CT, which revealed ILD changes to admission. Patients with ILD course > 3 months or coexisting malignancy (within 3 years) or preexisting chronic obstructive pulmonary disease were excluded. The primary outcome was the 6-month all-cause mortality since the time of admission.
The derivation cohort was randomly divided into the training (n = 121) and testing (n = 31) sets. The internal and external validation cohorts comprised 44 and 32 patients, respectively (Figure 1).

Radiomics Analysis of HRCT Images
Patients underwent non-contrast pulmonary HRCT within a week of admission (median, 2 days; range, 1-6 days) using a multidetector CT scanner. CT slice thickness was 1.0-1.5 mm at 10-mm interval with standard reconstruction. Radiomics analysis was performed using a dedicated software (Radiomics, syngo.via Frontier 1.2.2, Siemens Healthineers, Forchheim, Germany). This platform integrated an opensource library named "PyRadiomics" (version 2.0.1). First, the Digital Imaging and Communications in Medicine files of CT images were imported; segmentation of the volumetric lung was performed automatically with deep learning algorithm and adjusted manually by ZC and YZ (both with more than 5 years' experience in chest CT imaging). The medical image series were resampled to 1 1 1 mm voxel size with B-spline interpolator before subsequent feature extraction steps. Bin width was set at 25 to create histograms for discretization of the image gray levels. The extracted features are as follows: (i) 18 first-order features, which describe the distribution of voxel intensities in region of interest (ROI); (ii) 18 shape and size features, which include descriptors of 2D and 3D size and shape of ROI; (iii) 75 texture features, describing patterns or the spatial distribution of voxel intensities. In addition to the original images, wavelettransformed images were generated using eight different high-to low-frequency band combinations according to three directions.
Thus, a total of 855 features, namely, 111 from the original image and 744 from the wavelet-transformed images, were outputted. Then, the intra-and inter-observer reliabilities were evaluated using Spearman's correlation coefficients in derivation cohort; features with rho>0.8 were included for further analysis.

Radiomics Model Construction and
Radiomic Score (Rad-Score) Generation The radiomics model was derived from the training set and tuned in the testing set. The least absolute shrinkage and selection operator (LASSO) and Cox regression model was used to select features, with penalty parameter tuning conducted by ten-fold cross-validation (17). Then, optimal features for radiomics model were determined by backward stepdown analysis with Akaike's information criterion (AIC) applied as the stopping rule.
Radiomic score for each patient was calculated from a linear combination of selected radiomic features weighted by respective coefficients.

Visual HRCT Scoring and ILD-GAP Model
All CT images were reviewed by ZC and YZ who were blinded to patients' outcome. Inter-observer variability was evaluated by intraclass correlation coefficient (ICC). The results were agreed upon by consensus between the two observers. HRCT findings were graded on a scale of 1-6 based on the classification system that corresponded to consecutive pathologic phases as previously reported (8): 1, normal attenuation; 2, ground-glass attenuation (GGA); 3, consolidation; 4, GGA associated with traction bronchiolectasis or bronchiectasis; 5, consolidation associated with traction bronchiolectasis or bronchiectasis; and 6, honeycombing.
The lungs were divided into six zones (upper, middle, and lower on both sides); each zone was evaluated separately. The score was based on the percentage of the abnormal lung parenchyma and was estimated to the nearest 5% of parenchymal involvement. The overall CT visual score was calculated by adding the average score of six zones (6).
In addition, an ILD-GAP score without lung imaging information was also calculated in our analysis as a comparator, which was modified from the IPF-derived GAP staging system and had been validated in some non-IPF ILD cohorts for prognosis prediction (18,19). Finally, a subgroup analysis was performed to show in which cases the radiomics model outperformed the visual scoring model.

Statistical Analysis
The differences of clinical parameters between the derivation and internal validation and external validation cohorts were assessed by the Mann-Whitney U-test, chi-square test, or Fisher exact test, as appropriate. Post hoc comparisons were made with Bonferroni's correction for adjustments of multiple comparisons. The optimal cutoff values for significant continuous variables were identified by X-tile (20). The association between Rad-score and outcome was assessed by Kaplan-Meier survival plot and log-rank test.
An improved multidimensional prediction model based upon Rad-score (Rad-score plus model) was constructed by integrating clinically significant predictors from univariable analysis into the multivariable Cox proportional hazards model (backward stepdown selection with AIC).
Correlation analysis between visual score, Rad-score, and other variables were applied with Spearman's coefficients. Model discrimination was quantified by the Harrell concordance index (C-index) with 95%CI. The change in overall log-likelihood ratio was used to assess the increase in predictive power. Additionally, the model calibration was assessed using the calibration curve, Greenwood-Nam-D'Agostino (GND) test, and Brier score (13,21,22). A decision curve analysis was built to determine and compare the clinical usefulness of each model (23). Significance was defined as p-value < 0.05. Statistical analyses were performed on SPSS software version 25 (IBM Corp., Armonk, NY, United States) and R software version 3.6.1. All R codes used were available at GitHub (https://github.com/tomato08217/MDA-ILD5).

RESULTS
The all-cause 6-month mortality was 62/152 (40.8%), 17/44 (38.6%), and 14/32 (43.8%) for derivation, internal validation, and external validation cohorts, respectively. Table 1 presents the comparable clinical characteristics of the three cohorts in detail. Comparisons between the training and testing sets of the derivation cohort are also listed in Supplementary Table 1. A total of 17.8% patients lacked the data of the forced vital capacity percentage of predicted (FVC%), since they were in severe condition unable to complete either routine or bedside spirometry. For the remaining patients, the optimal cutoff values calculated for FVC% were 49.7% (rounding to 50%). Therefore, we classified FVC% into three categories: FVC% ≥ 50%, FVC% < 50%, and unable to perform pulmonary function tests.
Notably, the treatment regimens differed between the two centers, with significantly higher intensity of glucocorticoid and immunosuppressant exposure in the external validation cohort ( Table 1). And, in the derivation cohort, the deceased received significantly higher dose of glucocorticoid than survivors, while the intensity of immunosuppressant use was comparable between the two groups ( Table 2).
The optimal cutoff value for Rad-score was determined as 1, dividing patients into the high-risk group (Rad-score > 1) and low-risk group (Rad-score ≤ 1) with distinct 6-month mortality (Figure 2A). The representative patients' CT images of the two groups are presented in Figures 2B,C, respectively, with corresponding Rad-scores.

Rad-Score Plus Model
The univariable Cox regression results of baseline clinical characteristics and treatment between the survivors and deceased in the derivation cohort are presented in Table 2. Then, 10 candidate predictors, i.e., with significant differences in univariable analyses, were incorporated into a multivariable Cox proportional hazards model with Rad-score, indicated as follows: age on admission, course of DM, arthralgia, threecategory FVC% (FVC% ≥ 50%, FVC% < 50%, and unable to perform pulmonary function tests), arterial oxygen/fraction of inspiration oxygen (PaO 2 /FiO 2 ), lactate dehydrogenase (LDH), serum ferritin (ng/mL), C-reactive protein, lymphocyte, and maximum dosage of methylprednisolone ( Table 3).
The optimal combination obtained by the backward stepwise algorithm with minimum AIC was as follows: Rad-score, FVC% (three categories), and age (on admission). The respective hazard ratios and β-regression coefficients are presented in Table 4

Visual Score and ILD-GAP Model
The agreement of visual CT score between two readers was good with ICC of 0.821 (95%CI, 0.711-0.898). The visual score model yielded the C-index values of 0.75 (95%CI, 0.66-0.84) for the training set, 0.81 (95%CI, 0.65-0.98) for the testing set, and 0.79 (95%CI, 0.65-0.94) for the internal validation cohort, when predicting the 6-month mortality of MDA5 + DM-ILD. However, the discriminative power of visual score dramatically declined in the external validation cohort with the C-index of 0.66 (95%CI, 0.49-0.82) and poor calibration ability. However, the C-index values of ILD-GAP model were relatively even in each dataset (see Table 5 for details).

Comparison of the Predictive Performance of Each Model
The prediction accuracy and robustness of the ILD-GAP score, visual score, radiomics model, and Rad-score plus model are compared in Table 5, with respective calibration curves (Figure 3). The final Rad-score plus model had the best performance surpassing the radiomics model, visual score, and ILD-GAP score (p < 0.001). Finally, the decision curve analysis demonstrated that, in terms of clinical applicability, the Radscore plus model also had a higher overall net benefit than other models across the majority of the ranges of threshold probabilities (Figure 4).
In the correlation analysis, the visual score, Rad-score, and Rad-score plus mentioned above all showed strong correlation with FVC and PaO 2 /FiO 2 rather than other clinical predictors (for details, please see Supplementary Table 3).

Subgroup Analysis
Across all datasets, radiomics performed better than visual score in 35 cases, which were defined as Rad-score better group. This subgroup presented a significantly higher mortality, a shorter disease course, a worse pulmonary function, and a higher dosage requirement of steroid (Supplementary Table 2), possibly implying that the radiomics model has a better performance in the patients with more progressive disease course.

DISCUSSION
In the current study, we first applied radiomics analysis in MDA5 + DM-ILD based on a large multicenter cohort. This radiomics approach could not only quantitatively assess the ILD on HRCT, but also excellently predict the prognosis of this tricky disease, surpassing the traditional visual score. Furthermore, the final Rad-score plus model, integrating Rad-score and clinical parameters, showed a superior discriminative performance.
MDA5 + DM-ILD remains to be a great challenge for us despite early diagnosis and recent treatment advances (4,24). Immunosuppressive therapy has been widely accepted as the cornerstone of treatment for this tricky disease. However, the therapeutic effect of a specific regimen has not been demonstrated depending on risk or severity stratification. Accumulative prognostic factors of the disease have been reported involving aspects of respiratory physiology, laboratory markers, and radiology (2,5,7,25). To the best of our knowledge, the current study is the first attempt to quantitatively assess MDA5 + DM-ILD by applying the computer-based radiomics approach, which enriches the panorama of prognostic prediction for this disease.
Our data illustrated that the Rad-score derived from multiple radiomic features could provide a reliable predictor for the 6month mortality of MDA5 + DM-ILD, and outperform the visual CT score. It implied that radiomics enhanced the quantification of chest CT images as it provided more comprehensive features to describe the distribution, heterogeneity of CT value, and texture information behind the visual image. Shape and texture features were the predominant components of the Rad-score. The selected shape feature of flatness, which indicated the change in the lung shape during the progression of the disease, had a negative impact on prognosis. The texture features of MCC, IDMN, and Large AreaHighGrayLevelEmphasis, describing the complexity of texture or the distribution of gray level    value, implied that the survival outcome was worse with the increasing heterogeneity of the texture. Such heterogeneity might impute to the lesions of GGA, consolidation, and reticulation pattern, which were reported by Hozumi et al. as the most distinctive HRCT patterns for MDA5 + DM-ILD (26). Skewness measured the asymmetry of CT value distribution, which was also discovered to indicate the disease progression of IPF in previous studies (27). Of note, in the population of MDA5 + DM-ILD, the organizing pneumonia pattern was found to be dominant in HRCT, with lower zone consolidation correlating with rapidly progressive ILD (28). Moreover, another study with pathological evidences discovered that a perilobular opacity transforming to consolidation corresponded to diffuse alveolar damage, which was presented as the main finding in MDA5 + DM associated with rapidly progressive ILD (29). Thus, these manifestations might result in the changes of CT values and texture, making these radiomic features important for diagnosis, risk stratification, and prognosis. In addition, the Rad-score performed even better than visual score in a more aggressive subgroup with worse prognosis, supported by our subgroup analysis. Consistent with our previous study of prognosis on this disease, the clinically relevant parameters were age and FVC% levels in the final Rad-score plus model, and coincidentally, they were also the main components in GAP staging system (18,30). Moreover, HRCT imaging in combination with pulmonary function test was acknowledged as the gold standard for a pivotal and non-invasive assessment of ILD (31). Thus, it was rational to combine FVC% with Rad-score to further enhance its predictive performance.
The limitations of our study first lie in the retrospective design of data collection and relatively limited sample size of external validation. Notably, the treatment regimens were relatively dissimilar between two centers, with significantly higher intensity of glucocorticoid and immunosuppressant exposure in the external validation cohort. However, the overall mortality and other baseline parameters were comparable between them. The verification of our findings would be better in a larger-scale prospective multicenter validation cohort, by which the biases of different machine conditions, patient selection, and treatment protocols could be minimized and controlled. However, even in tertiary centers, the rarity of this disease is well recognized. Second, since the computer-based radiomic features are known as agnostic features (12) and beyond visual conception, it is difficult to interpret the Rad-score pathophysiologically. Possibly, further work in radiogenomics may help to establish an indepth link between radiomic changes and pathogenesis. Lastly, the feature extraction was based on the whole lung rather than the visible extent of ILD. Although this could include all information on the lung, it may also dilute the texture features of the affected regions. Further research focused on the affected regions is required.

CONCLUSIONS
A quantitative computer-derived radiomic signature was a superior and reliable prognostic predictor for the 6-month mortality in MDA5 + DM-ILD. This instrument may grant a step forward to elegant clinical trial design and precision management of this challenging disease. is fundamental to decision curves (measured in the y-axis) and referred to classification accuracy of a model. Suppose high risk is defined as risk above some risk threshold R (x-axis), an intervention is recommended in such high-risk patients. The NB of using the risk model was calculated by the true-positive rate (TPRR), the proportion of cases with risk above risk threshold R, and the false-positive rate (FPRR), the proportion of controls with risk above risk threshold R. The horizontal dotted line at NB = 0 indicated a simple policy of no intervention to all patients (treat none); the gray curve in the plot depicted the NB of another simple policy: the intervention was recommended to everyone regardless of risk. In our result, the Rad-score plus model had the highest net benefit compared to others, almost across the full range of threshold probabilities.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
WX, WW, QH, YZho, and SY contributed to the project design. WX, WW, ZC, and DZ collected the clinical data of derivation cohort. BG, QL, and QH collected the clinical data of validation cohort. JZ and KW were responsible for laboratory data. YZhe contributed to the assessment of pulmonary function tests. ZC and YZhe conducted visual HRCT evaluation. WX, YZhe, and XT operated the Radiomics software. WX, WW, and XT contributed to analysis of all data. WX, WW, and SY wrote and revised the manuscript. All authors have read and agreed to the final version of the manuscript.