Non-contrast computed tomography-based radiomics for staging of connective tissue disease-associated interstitial lung disease

Rationale and introduction It is of significance to assess the severity and predict the mortality of patients with connective tissue disease-associated interstitial lung disease (CTD-ILD). In this double-center retrospective study, we developed and validated a radiomics nomogram for clinical management by using the ILD-GAP (gender, age, and pulmonary physiology) index system. Materials and methods Patients with CTD-ILD were staged using the ILD-GAP index system. A clinical factor model was built by demographics and CT features, and a radiomics signature was developed using radiomics features extracted from CT images. Combined with the radiomics signature and independent clinical factors, a radiomics nomogram was constructed and evaluated by the area under the curve (AUC) from receiver operating characteristic (ROC) analyses. The models were externally validated in dataset 2 to evaluate the model generalization ability using ROC analysis. Results A total of 245 patients from two clinical centers (dataset 1, n = 202; dataset 2, n = 43) were screened. Pack-years of smoking, traction bronchiectasis, and nine radiomics features were used to build the radiomics nomogram, which showed favorable calibration and discrimination in the training cohort {AUC, 0.887 [95% confidence interval (CI): 0.827–0.940]}, the internal validation cohort [AUC, 0.885 (95% CI: 0.816–0.922)], and the external validation cohort [AUC, 0.85 (95% CI: 0.720–0.919)]. Decision curve analysis demonstrated that the nomogram outperformed the clinical factor model and radiomics signature in terms of clinical usefulness. Conclusion The CT-based radiomics nomogram showed favorable efficacy in predicting individual ILD-GAP stages.


Introduction
Interstitial lung diseases (ILDs) are spread parenchymal lung disturbances frequently associated with connective tissue disease (CTD) (1).All patients with CTD face the risk of ILD, which may occur at any point during the period of CTD, even the first clinically apparent manifestation of their CTD (2).ILDs are mostly seen in systemic sclerosis (SSc), rheumatoid arthritis (RA), Sjögren's syndrome (SjS), systemic lupus erythematosus (SLE), idiopathic inflammatory myositis [including polymyositis (PM)/ dermatomyositis (DM) and anti-synthetase syndrome], and mixed connective tissue disease (MCTD) (3).
On account of the shortage of randomized controlled trials and recommendations, identifying which treatment to implement for CTD-ILD is currently a predicament for clinicians (3)(4)(5).Although it has been reported that ILD is associated with early mortality, which is responsible for up to 35% of CTD-related deaths in some cohorts (6)(7)(8)(9)(10), rushing into medical intervention may result in unnecessary drug toxicant exposure on stable patients and opportunity of infection (11,12).Thus, staging approaches across CTD-ILD for individual treatment need to be developed to relieve impairments (3,9).
The GAP (gender, age, and pulmonary physiology) index and staging scale were proposed for predicting the mortality risk of idiopathic pulmonary fibrosis (IPF) patients by Ley et al. (13) in 2012 and subsequently improved and validated to adapt non-IPF ILDs by Ryerson et al. (14).The ILD-GAP index scale used gender, age, predicted forced vital capacity (FVC), and diffusion capacity of carbon monoxide (DLCO) to estimate the severity and predict the mortality in patients with chronic ILD.It has been validated to be accurate in various kinds of CTDs (15)(16)(17)(18)(19).
Computed tomography (CT) scan remains the main method for ILD diagnosis at present because it is a noninvasive sensitive technique for detecting lung involvement in CTD patients (20)(21)(22)(23).CT imaging together with PFT is the gold standard to assess and stage the severity of ILD noninvasively at present (24).However, visual analysis of ILDs on CT image presents difficulty in providing prognosis information because different stages of ILD share overlapping imaging features, conferring difficulty in diagnosing and assessing the severity of ILD by conventional imaging modalities.Radiomics technology can extract a large number of high-dimensional features from CT images, which could make up for the shortcomings of visual assessment.Radiomics has been investigated for diagnosis and prognosis in many diseases, but mostly in different kinds of tumors (25,26).Radiomics were able to predict mortality and response to treatment in patients with CTD-ILDs, exploring prognostic information hiding beneath CT images that visual assessment has difficulty in acquiring (27,28).There were correlations between radiomics features and GAP stages, indicating potentials in radiomics to stage patients in CTD-ILDs (29).In the present study, we aimed to establish a CT-based radiomics nomogram to differentiate and stage CTD-ILD phases.

Patients
Authorization of the institutional review board was granted, and informed consent was waived.
Patients who were clinically diagnosed with CTD (SSc, RA, SjS, PM/DM, SLE, and MCTD) from June 2015 to June 2021 in Shandong Provincial Hospital Affiliated to Shandong First Medical University (dataset 1) and Qilu Hospital of Shandong University (dataset 2) were screened consecutively.Patients were included when they satisfied all of the following conditions: 1) diagnosed with CTD fulfilling the American college of rheumatology/European league against rheumatism (ACR/EULAR) or other acknowledged classification criteria (30-35), 2) underwent CT scan with signs of ILD within 3 months after clinical diagnosis, and 3) underwent pulmonary function tests (PFTs) and laboratory examination within 30 days before or after the CT scan.Patients were ruled out when they fulfilled any of the following conditions: 1) diagnosed with tumors in the lung; 2) diagnosed with idiopathic interstitial pneumonia, sarcoidosis, or any disease other than CTD that may lead to ILD; 3) any surgical history of the thorax; and 4) incomplete demographic or clinical data.The PFT indices included the percentage predicted values (% predicted) of forced expiratory volume in 1 s (FEV1), FVC, total lung capacity (TLC), and diffusion capacity of carbon monoxide.The ILD-GAP index was calculated according to Ryerson et al. (14).The patients were divided into two groups where Group I included patients with ILD-GAP index ≤1, and Group II included patients with ILD-GAP index >1.All patients were followed up until October 2022 and all-cause mortality was the endpoint.The predictive performance of the ILD-GAP index was evaluated by using univariate variable Cox regression and Harrell's C index.Patients in dataset 1 were then randomly split into training and internal validation cohorts at a ratio of 7:3.The external validation cohort was composed of patients in dataset 2.

CT image acquisition and evaluation
All CT examinations were performed in supine position with maximum inspiration.The detailed scanning parameters are shown in Supplementary Table S1.
The CT images were reviewed by two radiologists (Qin S.N. with 5 years and Wang X.M. with 20 years of thoracic imaging experience) without awareness of any other characteristics of the patients, and divergences were unified by consensus.The presence of visual characteristics of ILD (yes/no), including subpleural lines, reticular changes, honeycombing, pulmonary emphysema, and traction bronchiectasis, was evaluated case-by-case.All CT characteristics mentioned met the Fleischner Society criteria proposed in 2008 (36).The proportion (%) of the parenchymal extent in total lung volume was calculated using the pneumonia diagnosis module of Dr. Turing ® artificial intelligence-assisted diagnosis system (Huiying Medical Technology Co., Ltd.).

Three-dimensional lung segmentation and extraction of texture features
All CT images were reprocessed by resampling into 1.0-mmthick slices and intensity normalization into a range of [-1, 1].The region of interest (ROI) segmentation within the borders of the right lung (window width = 1,500; window level = -750) was manually delineated using the 3D Slicer software (version 4.11, www.slicer.org).The outline of the ROI was contoured avoiding the hilar vessels.The left lung was not segmented, since the presence of the heart may add to the difficulties of segmentation and potentially lead to alterations in the results.
Extraction of the radiomics features was conducted through the Radcloud platform (www.huiyihuiying.com,Huiying Medical Technology Co., Ltd.).Compliant with the definitions of the Imaging Biomarker Standardization Initiative (37), 1,409 radiomics features altogether were retrieved from each ROI, whose information are in the Supplementary Results.
Interclass and intraclass correlation coefficients (ICCs) were applied through the following steps: 20 cases containing 10 Group I patients and 10 Group II patients were randomly selected to perform ROI segmentation by the readers.Reader 1 repeated the segmentation a month later.Segmentation was considered well matched in terms of the interobserver reliability and intraobserver reproducibility when the ICC value was greater than 0.75.Reader 1 then completed the rest of the segmentation procedures.

Construction of the clinical model
The clinical factor model comprised significant difference variables between the two groups (p< 0.05) selected by univariate logistic regression analysis, including clinical data, laboratory examinations, and visual CT characteristics.Gender, age, and PFT parameters were excluded to prevent data leakage of the models.Then, the model was built using multivariable logistic regression analysis.Odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for significantly correlated variables.

Construction of the radiomics model and the combined model
To prevent model overfitting, dimensionality reduction of the radiomics features was performed before the signature to be constructed.In the training cohort, the features for constructing the radiomics model should satisfy the following conditions: interobserver and intraobserver ICCs exceeding 0.75; remarkable variant from one another as confirmed by analysis of variance; and selected as major contributories for predicting by bringing into the least absolute shrinkage and selection operator (LASSO) regression model.Finally, the radiomics model was constructed using the support vector machine (SVM) with selected features.The radiomics score (Rad-score) representing the weighting coefficient of the features for each patient was calculated.
Incorporating the significant clinical factors and the radiomics signature, a radiomics nomogram was constructed using multivariable logistic regression analysis.Variance inflation factors (VIFs) of the predictors were calculated for multicollinearity.A calibration curve was drawn to estimate the calibration of the combined model.The goodness of fit of combined model was estimated using the Hosmer-Lemeshow test.

Evaluation of model capabilities
The classification performance of the clinical factor model, radiomics model, and combined model to differentiate Group II CTD-ILD from Group I was represented by the area under the curve (AUC) from receiver operating characteristic (ROC) curves.The comparison between the three models was assessed using the likelihood ratio test (LRT).The net benefits for a range of threshold probabilities were calculated by applying decision curve analysis (DCA) to measure the clinical benefit of the combined model.All three models were externally validated based on dataset 2 to evaluate the model generalization ability using ROC analysis.

Statistical analysis
SPSS (version 26.0) and R software (version 3.5.1)were used to perform statistical tests and analyses.Significantly different clinical characteristics were detected using chi-square test, Fisher exact test, or Mann-Whitney U test, where appropriate.The DeLong test was conducted to test whether the AUC of the models based on the same cohort significantly varied.Categorical and continuous variables are shown by form of frequency (percentages), mean ± standard deviation, or median (interquartile range), where appropriate.The regression analysis, nomogram development, calibration plots, ROC analysis, and DCA were performed by the packages "rms," "glmnet," "pROC," and "dcurves" in R. A two-tailed p value of<0.05 was regarded as indicating significant variation.S2.Table 1 listed the baseline patient characteristics in dataset 1.The ILD-GAP index exhibited increasing mortality in patents with higher stages by univariate Cox regression (Hazard Ratio, 5.364; 95% CI, 1.994-14.424;p = 0.01) and showed acceptable mortality predictive performance (C-index 0.703) in some of the patients of dataset 1 (n = 74).More detailed follow-up information was shown in the Supplementary Material.Table 2 exhibited the outputs of univariable and multivariable logistic regression analyses, which suggested that pack-years of smoking and traction bronchiectasis remained as independent predictors.Patients with a larger number of cigarettes smoked (OR, 1.036; 95% CI, 1.010-1.063)or traction bronchiectasis on CT image (OR, 3.705; 95% CI, 1.222-11.239)tended to have a higher mortality.We Flowchart of the study patients.examined the two predictors in dataset 2, which showed parallel results (Supplementary Table S4).

Development of the radiomics model
A total of 1,409 radiomics features were obtained from the CT images; 1,367 of them were examined to be of promising interobserver and intraobserver accordance (intraclass correlation coefficient >0.75).Seventy significantly different (p< 0.05) radiomics features selected went through the LASSO logistic regression analysis to choose the optimally related features (Figure 2).Eventually, nine features were put into radiomics model construction.Supplementary Table S5 listed elaborated information of the features.The Rad-score was calculated according to the following equation:

Âwavelet-LLL_glszm_SmallAreaEmphasis
The Rad-score was a tested statistically significant variant between the two groups (p< 0.05; Supplementary Table S6) and presented in Figure 3.

Development of the combined model
By comprising the pack-years of smoking, traction bronchiectasis, and Rad-score, a combined model was built in the training cohort (Figure 4A).The VIFs of the predictors ranged from 1.04 to 1.08, indicating that there was no multicollinearity.The calibration curve of the radiomics nomogram is presented in Figures 4B-D, which represented acceptable calibration in the training cohort (p = 0.089), the internal validation cohort (p = 0.107), and the external validation The validation of the capabilities of the models The capability of the diagnostic efficiency for each model is presented in Table 3.The ROC curves of the clinical factor model and combined model are presented in Figure 5.
The DCA for the three models presented that the combined model performed better than the clinical model and the radiomics model in distinguishing between different stages of CTD-ILD across the majority of the range of reasonable threshold probabilities (Figure 6).

Discussion
The present study showed that the combined model, which incorporated the CT-based Rad-score and clinical variables, had favorable predictive efficacy to distinguish different ILD-GAP stage patients with an AUC of 0.887, 0.885, and 0.851 in the training, internal validation, and external validation cohorts, respectively.In the present study, clinical variables and visual characteristics on CT image were enrolled.Multiple logistic regression analysis revealed that a larger number of cigarettes smoked and traction bronchiectasis on CT were independent predictors.Only 30 patients (14.85%) ever

A B
Feature selection and dimensionality reduction workflow.(A) Confirmation of the tuning parameter (l) in the least absolute shrinkage and selection operator model.An optimal l value of 0.015 with (vertical dash line) was selected.(B) The feature coefficients varied according to log(l).

A B C
The radiomics scores for each patient in the training (A), internal validation (B), and external validation (C) cohorts.
smoked in our dataset, and we believe it is because the number of male patients is smaller (n = 55, 27.23%).This revealed not only actual gender distribution of the CTDs but also the significant influence that smoking exerted on the mortality of CTD-ILD patients.A clinical factor model to classify ILD-GAP stages was then developed, incorporating pack-years of smoking and traction bronchiectasis on CT image, and achieved a high AUC of 0.803, 0.763, and 0.817 in the training, internal validation, and external validation cohorts, respectively.Honeycombing was proven not associated with GAP stages in our study that, however, with  traction bronchiectasis were both independent risk factors for mortality in some research (38,39).This was probably because of some biases caused by the imbalance between the groups.Goh et al. ( 40) established a staging system using the extent of disease with 20% demarcation for predicting mortality.However, parenchymal extent was not an independent predictor in the GAP index system by multivariable logistic regression analysis (p = 0.053) in our study.This was probably because Goh's model built for SSc-ILD patients might not be applicable for all kinds of CTD-ILDs.Another reason might be that we did not find an optimal cutoff for parenchymal extent.
Radiomics based on CT image is an objective technique that provides a more reliable and comprehensive quantitative assessment of the images, not hindered by inter-reader variability.In the 1,409 radiomics features obtained from the CT images, eight higher-order texture features extracted from wavelet transformed images were acquired as remarkable elements to build the radiomics model, resulting in an AUC of 0.813, 0.787, and 0.718 in the training, internal validation, and external validation cohorts, respectively.Texture features can quantify information that is difficult to perceive visually, such as texture patterns or tissue distribution (41).Wavelet transform can level it up by obtaining multifrequency domain and multiscale image information after turning original images into different frequency domains (42,43).For diseases that are difficult to be described by simple visual features, high-dimensional abstract feature extracted from wavelet transformed images can often provide different angles in capturing hidden information that is not easily observed by visual assessment.
Radiomics features have been proven to have potential for the severity estimation of CTD-ILD and treatment decision guidance (29).In recent years, rapidly developed radiomics provided large quantities of radiomics features, enabling full-scale characterization of the images beyond visual analysis.The clinical factor model comprising visual assessment performed significantly poorer in predicting GAP stage than the radiomics nomogram, indicating that information gathered from clinical and radiologic practice might be insufficient, and radiomics had the advantages of capturing and identifying the subtle features of ILD on CT images that were imperceptible to the radiologist but may imply prognosis.At present, there are limited studies focusing on applying radiomics in CTD-ILDs.Martini et al. (29) applied radiomics methods to develop a multivariable model and differentiate GAP stages in 60 patients with SSc, resulting in an AUC of 0.96.Instead of focusing on one single type of CTD, we expanded our samples up to 245 patients with different subtypes of CTD, which improved the universality of our radiomics nomogram.Most of the studies focused on predicting mortality of CTD-ILDs (27,44); instead, we aimed to stage patients using baseline data and reduce potential unnecessary examinations.The promising results underlined the great potential of radiomics in ILDs.In the future, radiomics could be applied to support treatment decision.Previous studies have also proven that quantitative analysis can be applied to patients with ILDs.Kaya et al. ( 45) established a quantitative model with an AUC of 0.80 to predict GAP stages in 40 patients with idiopathic pulmonary fibrosis, proven to have the underlying possibility to outperform subjective visual inspection.Jacob et al. (46) proved that the volume of pulmonary blood vessels and surrounding fibrosis in the lungs independently predicted outcome in patients with RA-ILD.Radiomics methods provided much more information on the CT images that cannot be obtained by regular quantitative methods.In the present study, eight out of the nine features were high-order features, which may cover and exceed the quantitative features that previous studies have extracted.
Certain limitations of our study were as follows.First, cases in the two groups of our study were not balanced, therefore reflecting the prevalence of different GAP stages in our clinical population but may  have an impact on our results.Second, there is still a gap for the assessment whether and which radiomics features were correlated with pathological manifestations in ILD.Thus, a multidisciplinary method combining clinical, radiological, and pathophysiological information may be proposed to guide individual-based treatment and benefit the prognosis.Third, there are certain holdbacks that radiomics could not be applied to all medical centers regarding technical limitations.The retrospective nature of this study may also hamper its reproducibility and generalization.Therefore, well-designed prospective radiomics trials as well as one-stop services that automatically segment images, extract features, and calculate the Rad-score need to be developed.Moreover, the result of this cross-sectional study may be less precise for using the verified ILD-GAP index system rather than actual mortality of the patients.The exact mortality risk and follow-up results will be investigated in our further research.
In conclusion, a CT-based radiomics nomogram was developed in our study.It revealed better efficacy in staging the severity of CTD-ILD on CT image than visual assessment, which implies that this noninvasive and quantitative method may impact the clinical decision-making process.

Figure 1
Figure 1 showed the process of patients' enrollment and model construction.Eventually, a total of 245 patients (dataset 1, n = 202;

4
FIGURE 4 The radiomics nomogram (A) constructed combining pack-years of smoking, traction bronchiectasis, and Rad-score and the calibration curves of the radiomics nomogram in the training (B), internal validation (C), and external validation (D) cohorts.

5
FIGURE 5 Comparison of the ROC curves for the radiomics model, clinical model, and combined model in the training cohort (A), the internal validation cohort (B), and the external validation cohort (C).

TABLE 1
Patients' baseline clinical factors between group I with GAP stage I patients and group II with GAP stage II/III patients in dataset 1.

TABLE 2
Risk factors for Group II CTD-ILD in the training cohort.

TABLE 3
Diagnostic performance of the clinical factor model, the radiomics signature, and the radiomics nomogram.

TABLE 4
Comparison among the three models.
*Comparison of the performance of the clinical model and the combined model.# Comparison of the performance of the radiomics model and the combined model.