Development and validation of A CT-based radiomics nomogram for prediction of synchronous distant metastasis in clear cell renal cell carcinoma

Background Early identification of synchronous distant metastasis (SDM) in patients with clear cell Renal cell carcinoma (ccRCC) can certify the reasonable diagnostic examinations. Methods This retrospective study recruited 463 ccRCC patients who were divided into two cohorts (training and internal validation) at a 7:3 ratio. Besides, 115 patients from other hospital were assigned external validation cohort. A radiomics signature was developed based on features by means of the least absolute shrinkage and selection operator method. Demographics, laboratory variables and CT findings were combined to develop clinical factors model. Integrating radiomics signature and clinical factors model, a radiomics nomogram was developed. Results Ten features were used to build radiomics signature, which yielded an area under the curve (AUC) 0.882 in the external validation cohort. By incorporating the clinical independent predictors, the clinical model was developed with AUC of 0.920 in the external validation cohort. Radiomics nomogram (external validation, 0.925) had better performance than clinical factors model or radiomics signature. Decision curve analysis demonstrated the superiority of the radiomics nomogram in terms of clinical usefulness. Conclusions The CT-based nomogram could help in predicting SDM status in patients with ccRCC, which might provide assistance for clinicians in making diagnostic examinations.


Introduction
Renal cell carcinoma (RCC) represents the seventh most prevalent malignant tumors, leading to around 140,000 deaths every year (1). Clear cell RCC (ccRCC) is the major histological subtype, accounting for about 80% of all cases (2). Owing to the widespread use of advantage radiologic diagnostic techniques, as well as the popularization in regular checkups, most incidentally detected renal lesions are small low-grade tumors. Nevertheless, 20%-30% of ccRCC patients have distant metastases at the time of diagnosis (synchronous distant metastasis, SDM) (3). Surgery is no longer suitable for metastatic ccRCC due to widespread metastatic disease; thus, systemic therapy is applicable in this setting (4,5). ccRCC with SDM has poor prognosis, with median survival of 16 months and a five-year survival rate of 3.6% (3). Early identification of SDM can certify the reasonable, personalized, and efficient treatment strategies were timely performed and ultimately improve patient survival (6). Hence, it was of great value to estimate the possibility of combined distant metastasis, by which we can fully make individualized examination and treatment plans.
Several clinicopathological parameters have been identified to establish the nomogram for predicting SDM of ccRCC patients (7,8): T stage, pathological differentiation grade, lymph node status, tumor size, and the invasion beyond the capsule. One of the most meaningful risk factors is the tumor size of the primary tumor. However, even small ccRCC have the potential to present SDM (9, 10), which is mean that relying too heavily on tumor size can lead to underestimate the true incidence of SDM. Many advanced imaging manners can contribute to the detection of SDM. However, using multitudinous imaging methods to check all potential metastatic sites for every ccRCC patient will heighten the extra economic and physical burden. On the other hand, some metastatic lesions may be small or share the overlapping imaging characteristics with other tumors, which can lead to the risk of missed diagnosis or misdiagnosis even though imaging examinations were performed (11)(12)(13)(14).
Radiomics is a promising technique using computerized quantitative imaging analysis to extract an enormous quantity of image-related features, such as intensity, geometry, and texture, from medical images (15,16). Radiomics features extracted from computed tomography (CT), and magnetic resonance imaging (MRI) have been successfully applied in predict SDM in ccRCC patients (17,18). However, these models were developed with limited samples or without external validation, making their clinical usefulness very limited. Moreover, clinical risk factors, which could improve predictive accuracy, have been overlooked.
In this multicenter study, we aim to develop and validate a CT-based radiomics nomogram, incorporating radiomics signature and clinical risk factors, for preoperative prediction of SDM in patients with ccRCC, based on a large collection of patient data from two different institutions.

Patients
The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). This retrospective study was approved by the Institutional Review Board of Shandong Provincial Hospital Affiliated to Shandong First Medical University and individual consent for this retrospective analysis was waived. The study population flowchart is illustrated in Figure 1.
Data for surgically and pathologically confirmed ccRCC cases were acquired by searching through the institutional database and medical record system. The inclusion and exclusion criteria of the patients are presented in Supplementary S1. Four hundred sixty-three patients from Shandong Provincial Hospital Affiliated to Shandong First Medical University diagnosed between January 2012 to December 2020, including 127 SDM ccRCC patients and 336 without SDM ccRCC patients, were randomly assigned to either the training cohort and internal validation cohort in a 7:3 ratio, using a stratified random split in patient level. External validation cohort consisted of 115 patients from Shandong Medical Imaging Research Institute between January 2015 to December 2019, including 29 SDM ccRCC patients and 86 without SDM ccRCC patients. A total of 58 SDM were confirmed by pathology, and the other SDM were diagnosed by radiologic features, that is, there was an increase in volume or number of suspected metastases during follow-up. SDM was defined as the distant metastatic lesion existing at the time of initial diagnosis before nephrectomy.

CT image acquisition and radiologic evaluation
The details of image acquisition parameters are shown in Supplementary S2. Each CT study was analyzed by a radiology resident (Reader 1, BK) and a radiologist (Reader 2, XMW) with 5 and 20 years of experience in abdominal imaging, respectively. Aware of the diagnosis of ccRCC but blinded to the radiological reports and pathologic details, the two researchers construed the following CT features by consensus: the maximum diameter of tumor on the axial CT image; tumor polarity (superior/middle/ inferior); tumor side (left/right); tumor margin (well defined/poorly defined); tumor shape (round/lobulated/irregular); enhancement degree (lower than cortex/higher than or similar to cortex); and necrosis (absence/presence). The maximum diameter of the tumor was measured by the two radiologists, and the average value was applied to the evaluation. For those qualitative parameters, in the event of disagreement, the two readers jointly reviewed the findings to reach a consensus for further analysis.

Development of clinical factor model
Univariate regression analysis was applied to the clinical factors, including clinical data (age, gender, weight, coronary heart disease, diabetes, hypertension, history of smoking), laboratory variables (hemoglobin, PLR, NLR, RLR, AFR, calcium, and creatinine), and CT features to find the factor that significantly affected the event occurrence probability. Then a multiple logistic regression analysis with a step-wise backwards elimination was subsequently applied to build the clinical factors model in the training cohort. Odds ratios (OR) as estimates of relative risk with 95% confidence interval (CI) were calculated for each risk factor.

Segmentation of tumor images and radiomics feature extraction
In order to remove the potential differences of CT images acquired from different CT scanners, normalization was performed on all original CT images using the gray-scale discretization method before extracting the radiomics features.
Corticomedullary phase and nephrographic phase images at 5.0-mm thickness were retrieved for radiomics feature extraction. The three-dimensional region of interest (ROI) were manually segmented along the tumor contour on each transverse section, avoiding covering the paratumoral renal parenchyma and perinephric fat, by using RadCloud (Huiying platform Medical Technology Co., Ltd.), which was an available platform reliably used in previous studies. Finally, 1409 radiomics feature were extracted, detailed in Supplementary S3.
Inter-and intra-class correlation coefficients (ICCs) were calculated to estimate the inter-observer reliability and intraobserver reproducibility of features extraction. Fifty cases of CT images containing 17 SDM ccRCCs and 33 without SDM ccRCCs were randomly chosen; region-of-interest segmentation was drawn by one radiology resident (Reader 1, BK) and one radiologist (Reader 2, XMW) independently; both were aware of the diagnosis of ccRCC but were blinded to the SDM status. Reader 1 then repeated the contouring procedure 8 weeks after the initial analysis to assess the agreement of feature extraction. The remaining image segmentation was performed by Reader 1.

Development of radiomics signature and radiomics nomogram
Only were the radiomics chosen to be kept when meeting a criterion of inter-and intra-observer ICCs greater than 0.75, then the minimum redundancy maximum relevancy method was performed to eliminate the redundant and irrelated features and kept 30 features. The remaining features were enrolled into the least absolute shrinkage and selection operator (LASSO) regression model to choose the optimized subset of features from the training cohort to construct the final model. A radiomics model was created by summing the selected feature values weighted by their respective coefficients, and the corresponding radiomics score was calculated for each patient.
To provide a more individualized predictive model, a nomogram combining the final radiomics model and clinical factors model was built in the training cohort. The calibration of the nomogram was evaluated with a calibration curve. The Hosmer-Lemeshow test was conducted to assess the goodness-offit of the nomogram. A radiomics nomogram score for each patient was obtained in the testing and external validation cohorts.

Assessment of the performance of different models
The predictive accuracy of the clinical factors model, radiomics model, and radiomics nomogram for predicting SDM were quantified by the area under the receiver operating characteristics (ROC) curve (AUC). Decision curve analysis (DCA) was used to calculate the net benefits for a range of threshold probabilities in the whole cohort to assess the clinical usefulness of the nomogram.

Statistical analysis
Statistical analysis were performed using R statistical software (version 3.6.3, https://www.r-project.org). Group differences of the clinical factors were figured out by means of chi-square test or Fisher exact test for categorical variables and Mann-Whitney U test for continuous variables, where appropriate. The clinical factors model was constructed using the backward step-wise multivariate logistic regression with Akaike information criterion (AIC) as criterion. The LASSO logistic regression was performed using the "glmnet" package; the ROC curves were plotted using the "pROC" package; the nomogram and calibration curves were performed using the "rms" package; and the DCA was performed using "rmda" package.

Clinical factors of the patients and construction of the clinical factor model
The patients' demographic baseline characteristics are summarized in Table 1. There are 325 ccRCC patients in the training cohort (216 men and 109 women; mean age, 55.3 ± 11.1 years), 138 patients in the internal validation cohort (95 men and 43 women; mean age, 55.4 ± 10.4 years) and 115 patients in the external validation cohort (84 men and 31 women; mean age, 53.9 ± 10.9 years). The rates of SDM ccRCCs were 27.4% (89 of 325), 27.5% (38 of 138), and 25.2% (29 of 115) in the training, internal validation, and external validation cohorts, respectively, whereas no statistically significant difference was found among the three cohorts (P=0.891). The confirmation approaches of SDM and sites of metastases are shown in Table 2.
The results of multiple logistic regression analysis are listed in Table 3. According to the backward step-wise multivariate logistic regression, age, sex, maximum diameter, shape, margin, calcium, hemoglobin, and AFR were incorporated into the development of the clinical factor model. The clinical score (Cli-score) was calculated with the following formula:  Radiomics feature extraction, selection, and radiomics signature establishment Among 2818 radiomics features extracted from corticomedullary phase and nephrographic phase CT images, 1704 features showed high stability, and then were reduced to 30 features by minimum redundancy maximum relevancy. In the final feature selection with the LASSO method ( Figures 3A, B), 10 most valuable features were kept, and displayed in Figure 3C. Violin plots showed that the difference of the 10 radiomics features between the SDM ccRCC and without SDM ccRCC groups (Supplementary Figure S1). The radiomics score (Radscore) was attained with the following formula:   Rad-score [median (interquartile range)] differed significantly between the SDM ccRCC and without SDM ccRCC groups in the training cohort [0.1 (-0.9, 1.5) vs. The radiomics nomogram establishment and assessment of the performance of different models By incorporating the Cli-score and Rad-score, a radiomics nomogram was developed in the training cohort ( Figure 4A training, internal validation, and external validation cohorts, respectively. ROC curves of radiomics nomogram are displayed in Figure 2. The AUC, sensitivity, specificity, and accuracy of the radiomics nomogram, respectively, were 0.929 (95% CI: 0.896, 0.961), 82.0%, 91.9%, and 89.2% in the training cohort, 0.916 (95%CI: 0.857, 0.975), 84.2%, 88.0%, and 87.0% in the internal validation cohort, 0.925 (95%CI: 0.855,0.994), 86.2%, 94.2%, and 92.2% in the external validation cohort. The distribution of Nomo-score with regard to SDM status in the training, internal validation and external validation cohorts is presented in Figure 5.
The diagnostic performance of every model is demonstrated in Table 4. A slightly higher AUC was observed for the radiomics nomogram after integrating Cli-score both in the internal validation cohort (0.916 vs. 0.869) and in the external validation cohort (0.925 vs. 0.882). Nevertheless, incorporation of the Cli-score into the radiomics nomogram did not show significantly improved prediction efficiency (P =0.181 and 0.133, respectively).
The DCA of the three model were presented in Figure 6. It showed that the radiomics nomogram and clinical factor model had a higher overall net benefit in differentiating SDM ccRCC from without SDM ccRCC than the radiomics signature across the full range of reasonable threshold probabilities.

Discussion
It is necessary to preoperatively identify the SDM status timely to certify the reasonable, personalized, and efficient treatment decision. In this retrospective study, we developed and validated a radiomics nomogram that incorporates the radiomics signature and clinical factors for individualized prediction of SDM in ccRCC patients before treatment. The proposed radiomics nomogram demonstrated favorable discrimination in both internal validation cohort (AUC, 0.916) and external validation cohort (AUC, 0.925), outperforming radiomics signature (internal validation, 0.869; external validation, 0.882) and clinical factor model (internal validation, 0.896; external validation, 0.920).
As far as we know, only few studies have been reported in the literature including radiomics-based methods for prediction of SDM ccRCC. Bai et al. (17) developed a MRI-based radiomics nomogram combining patient age, regional lymph node, pseudocapsule and Rad-score, and demonstrated the nomogram can be useful for differentiating SDM ccRCC from without SDM ccRCC, with an AUC of 0.854 (95%CI, 0.736-0.971) in the internal validation cohort and 0.816 (95%CI, 0.661-0.971) in the external validation cohort. Compared with MRI, CT has a wider range of uses for the detection, identification, and staging of ccRCC due to its high diagnostic accuracy. A study by   (19), indicating that the wavelet features may further explore the spatial heterogeneity at multiple scales within tumor regions. Some previous studies have reported that wavelet features might better reveal tumor biology and heterogeneity (20,21). Liang et al. found that wavelet features were of great importance to predict early recurrence of intrahepatic cholangiocarcinoma after partial hepatectomy (20). The shape feature, Maximum 3D Diameter, defined as the largest pairwise Euclidean distance between tumor surface mesh vertices, had the highest weights in the radiomics model. The Maximum 3D Diameter was positively correlated with SDM ccRCC, suggesting that larger tumor may be seen more commonly in SDM ccRCC, which is consistent with previous studies (22,23). Small Dependence Low Gray Level Emphasis (SDLGLE) is defined as the joint distribution of small dependence with lower gray-level values, and the greater value indicates less homogeneous textures and a greater concentration of low gray-level values in the image (24). We assumed the greater value of SDLGLE in SDM ccRCC might be related to the combination of a larger range of necrosis components with lower gray values (25,26).
Our study took plenty of clinical factors into account. In line with previous studies, AFR was selected as an independent predictor for without SDM ccRCC, which suggested that ccRCC patients with decreased AFR are more likely to have SDM (27-29). Numerous experimental researches have convincingly supported the concept that inflammation is an imperative ingredient of tumor progression (30)(31)(32). Serum albumin has protective effects such as nutrition and antiinflammatory, and fibrinogen can promote the invasion and metastasis of tumor cells through epithelial-mesenchymal transition and induce tumor blood vessel formation, thereby participating in tumor progression (33,34). Therefore, decreased serum albumin and elevated fibrinogen are symptoms of elevated systemic inflammation, and decreased AFR might be connected with a worse prognosis (27). According to the equation for the Cli-score developed in our study, ccRCC with decreased AFR tended to be accompanied with SDM, which was consistent with the previous studies. It should be noted that, for the other clinical features associated with inflammation, including PLR, NLR and RLR, we found they were significantly different between SDM ccRCC and without SDM ccRCC in training cohort. However, these clinical features were not independent factors after multivariate analysis and were excluded in the final model. We presume that the difference in endpoint event and the unbalance of the two groups might explain the discrepancy between study results.
There are several limitations to our study. First, the retrospective nature might have inevitably introduced bias in population selection. The two groups in our study population was unbalanced, which might indicate a spectrum bias and might have influenced the diagnostic performance. Besides, there was an imbalance between the training and internal validation cohort, due to the relatively small sample size. Prospective multicenter studies with considerably large datasets are needed to further validate the robustness and reproducibility of our model. Second, owing to the limitation of the small number of SDM ccRCC, there is not enough data to differentiate various site of SDM to perform a stratified analysis, The distribution of Nomo-score with regard to SDM status in the training (A), internal validation (B) and external validation (C) cohorts. Diagnostic performance of the clinical factors model, radiomics signature, and radiomics nomogram was assessed and compared through ROC curves in the training (A), internal validation (B) and external validation (C) cohorts. ROC = receiver operating characteristics; AUC = area under the receiver operating characteristic curve.  where TP, FP, TN, and FN denote true positive, false positive, true negative, and false negative, respectively. *Numbers in parentheses were used to calculate percentages. A B C FIGURE 6 Decision curve analysis for the three models in the training (A), internal validation (B) and external validation (C) cohorts. The y-axis shows the net benefit; x-axis shows the threshold probability. The red, orange, and green line represent net benefit of the radiomics nomogram, clinical factors model, and radiomics signature, respectively. radiomics features. Although only features with ICCs greater than 0.75 were kept for radiomics signature construction in our study, automated and accurate tumor segmentation must be developed to facilitate the efficiency of the radiomics process. In addition, it would be more interesting to develop a model to predict metachronous disease, which could be helpful in managing follow-up schedule.
In conclusion, our study presented a CT-based radiomics nomogram that showed satisfactory performance in predicting SDM among ccRCC patients, which can enable physicians to make more informed diagnostic examinations and treatment decisions.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions
Study conception and design: XY, LG, SZ, BK, and XW. Administrative support: XW. Provision of study materials or patients: XY, LG, CS, and BK. Collection and assembly of data: XY, LG, and SZ. Data analysis and interpretation: XY, LG, JZ, and BK. All authors contributed to the article and approved the submitted version.