A Metabolism-Related Radiomics Signature for Predicting the Prognosis of Colorectal Cancer

Background: Radiomics refers to the extraction of a large amount of image information from medical images, which can provide decision support for clinicians. In this study, we developed and validated a radiomics-based nomogram to predict the prognosis of colorectal cancer (CRC). Methods: A total of 381 patients with colorectal cancer (primary cohort: n = 242; validation cohort: n = 139) were enrolled and radiomic features were extracted from the vein phase of preoperative computed tomography (CT). The radiomics score was generated by using the least absolute shrinkage and selection operator algorithm (LASSO). A nomogram was constructed by combining the radiomics score with clinicopathological risk factors for predicting the prognosis of CRC patients. The performance of the nomogram was evaluated by the calibration curve, receiver operating characteristic (ROC) curve and C-index statistics. Functional analysis and correlation analysis were used to explore the underlying association between radiomic feature and the gene-expression patterns. Results: Five radiomic features were selected to calculate the radiomics score by using the LASSO regression model. The Kaplan-Meier analysis showed that radiomics score was significantly associated with disease-free survival (DFS) [primary cohort: hazard ratio (HR): 5.65, 95% CI: 2.26–14.13, P < 0.001; validation cohort: HR: 8.49, 95% CI: 2.05–35.17, P < 0.001]. Multivariable analysis confirmed the independent prognostic value of radiomics score (primary cohort: HR: 5.35, 95% CI: 2.14–13.39, P < 0.001; validation cohort: HR: 5.19, 95% CI: 1.22–22.00, P = 0.026). We incorporated radiomics signature with the TNM stage to build a nomogram, which performed better than TNM stage alone. The C-index of the nomogram achieved 0.74 (0.69–0.80) in the primary cohort and 0.82 (0.77–0.87) in the validation cohort. Functional analysis and correlation analysis found that the radiomic signatures were mainly associated with metabolism related pathways. Conclusions: The radiomics score derived from the preoperative CT image was an independent prognostic factor and could be a complement to the current staging strategies of colorectal cancer.


INTRODUCTION
Colorectal cancer (CRC) is one of the most common cancers and ranks as the third cause of cancer-related mortality worldwide (Siegel et al., 2020). Even with the recent progress in cancer treatment, the 5 years overall survival of CRC remains <60% (Moghimi- Dehkordi and Safaee, 2012). Traditionally, the treatment regime of colorectal cancer was mainly determined according to clinicopathological factors, such as the TNM stage, tumor size, differentiated grade, which didn't fully consider the heterogeneity of tumors. The emergence of gene expressionbased molecular biomarkers has brought hope for the precision treatment of colorectal cancer in the past decade, but the high cost and long detection time limited its clinical application. In recent years, the medical images, which were routinely detected in clinical practice, have emerged to be promising biomarkers for cancer treatment and management.
Radiomic is a multidisciplinary approach concerning the quantification of medical images, like CT and magnetic resonance imaging (MRI). By transforming medical images into high-dimensional quantitative feature data, radiomics have been successfully used in some medical researches, such as tumor genetic analysis, lesions qualitative, curative effect evaluation and prognosis prediction (Kumar et al., 2012;Lambin et al., 2017;Limkin et al., 2017). Typical radiomic features describe the tissue or lesion characteristics, such as tumor shape, tumor texture, which can provide abundant information for tumor assessment. Compared with traditional clinical diagnosis methods, radiomics has the advantages of cheap, non-invasive, and quantifiable.
Several studies have demonstrated that radiomics analysis combined with clinicopathological information can largely contribute to guiding treatment decisions. Huang Y. Q. et al. (2016) developed a radiomic nomogram incorporating clinical risk factors for preoperative prediction of lymph node metastasis in patients with colorectal cancer. Similarly, CT-based radiomics signature in colorectal cancer shows considerable potential discrimination in preoperative staging . Kim et al. (2015) reported that some distinct features extracted from CT images can significantly discriminate differentiated grades on colorectal adenocarcinoma. In terms of prognostic evaluation, radiomic features are regarded as independent biomarkers for assessing diseasefree survival in patients with early NSCLC. By combining with traditional staging systems and other clinicopathological risk factors, the radiomics signature achieved more effective performance . Farhidzadeh et al. (2016) found that the radiomic features extracted from MRI images in patients with nasopharyngeal carcinoma (NPC) embody the heterogeneity of the tumor, showing high recurrence prediction in two groups patients. In addition, it has been reported that radiomic features based on CT images may correlate with genomics data underlying clinical outcomes (Segal et al., 2007;Aerts et al., 2014). Therefore, analyzing image radiomic features and excavating the hidden biological information have become a promising direction of image biomarkers research.
In this study, we aimed to develop and validate a radiomicsbased nomogram to predict the postoperative outcome of colorectal cancer patients. RNA-seq data from the colorectal cancer subproject (COCC, Clinical Omics Study of Colorectal Cancer in China) of the ICGC-ARGO project (The International Cancer Genome Consortium-Accelerating Research in Genomic Oncology) were used to explore the underlying biological interpretation of the radiomic signature.

Data Collection
A total of 381 patients with colorectal cancer from The Sixth Affiliated Hospital of Sun Yat-sen University were enrolled in this study. Our study was approved by the Medical Ethics Committee of the Sixth Affiliated Hospital of Sun Yat-sen University. Patients admitted during 2007-2011 were assigned to the primary cohort (n = 242), while patients admitted during 2012-2015 were assigned to the validation cohort (n = 139). Fifty three patients of 381 patients were enrolled in the COCC project, so they have paired image data and RNA sequencing data. All the CT images are DICOM (Digital Imaging and Communications in Medicine) format from the image archiving and storage system of the Six Affiliated Hospital of Sun Yat-sen University. Baseline clinicopathological information containing age, gender, differentiated grade, lymph node metastasis and carcinoembryonic antigen (CEA, normal < 5 ng/ml, abnormal > 5 ng/ml) were also derived from the hospital archives. Region of interest (ROI) was manually delineated on the tumor outline by skilled doctors using the ITK-snap (Version 3.2). A total of 107 radiomic features were generated using pyradiomics (van Griethuysen et al., 2017) package in python 2.7 platform.

Model Construction
Z-score normalization for radiomic features was used to increase comparability. Only features with high intensity were retained for the following analyses. The least absolute shrinkage and selection operator (LASSO) with cox regression was used to construct the radiomic signature and calculate the radiomics score (Rad-score). A nomogram was constructed by incorporating the radiomics score with clinicopathological risk factors. The performance of the nomogram was evaluated by the calibration curve, receiver operating characteristic (ROC) curve and C-index statistics.

Correlating the Radiomic Features With Gene Expression Data
To explore the association between the radiomic features and the underlying biological mechanism, we conducted a correlation analysis between radiomic features and cancer-related hallmarks. DeepCC  was used to calculate the enrichment score of hallmarks of cancer for each patient. The Pearson's correlation coefficient between each hallmark and radiomic feature was calculated. Hallmarks that have a significant correlation with at least one radiomic feature were displayed in the heatmap.

Statistical Analysis
All statistical analyses were performed by R software (version 3.6.1). Time-dependent ROC curve was used to determine the optimal cut-off value of the radiomics score by "survivalROC" (Heagerty et al., 2000), which can divide patients into different risk groups. The R package "glmnet" was used to perform the LASSO-cox regression analysis (Friedman et al., 2010). Kaplan-Meier curves and log-rank tests were used to perform survival analysis. The primary outcome is disease-free survival (DFS). Univariable and multivariable analyses were performed by the cox proportional hazards regression model. Nomogram incorporating Rad-score with clinicopathologic factors was built by the "rms" packages (Harrell, 2016). The two-sided value of P < 0.05 was considered statistical significance in all analyses.

RESULT Features Selection and Model Construction
In the preprocessing step, radiomic features were first scaled with z-score normalization in the primary and validation cohort. The average signal values of each feature in different patients were calculated and compared. We only retained 85 features (80% of 107 features) with higher signal intensity for subsequent modeling. Further, LASSO-cox regression was applied to select 5 features with non-zero coefficients ( Figure 1A). Radiomics score was calculated by a linear combination of non-zero coefficients, which was multiplied by the 5 features vectors in the primary and validation cohort, respectively. The radiomics scores of all patients were displayed in Supplementary Table 1.  The calculation process was presented in the following formula: The optimal cut-off of Rad-score was determined by the timedependent ROC curve. Based on the threshold, the patients were divided into the high-risk (>-0.077) and low-risk (<-0.077) groups ( Figure 1B). Patients' clinical characteristics in the primary and validation cohort were presented in Table 1.

Construction and Performance of the Radiomics Nomogram
Subsequently, based on the results of the multivariable analysis, a nomogram was developed combining the Rad-score and TNM stage (Figure 3A). To illustrate the performance of the nomogram prediction, calibration curves were used to evaluate the degree of fitting between the nomogram and the actual FIGURE 2 | Survival analysis for radiomics score. The distribution of radiomics score in colorectal cancer and its correlation with recurrence status in the primary (A) and validation cohort (C). Kaplan-Meier curves showed a significant association between Rad-score groups and disease-free survival (DFS) in the primary cohort (B) and validation cohort (D).
Frontiers in Molecular Biosciences | www.frontiersin.org   outcome of patients. The results showed that our nomogram showed good concordance between the predictive and actual survival probability in the primary (Figures 3B,C) and the validation cohort (Figures 3D,E). The C-index of the nomogram achieved 0.74 (0.69-0.80) in the primary cohort and 0.82 (0.77-0.87) in the validation cohort. To further confirm the effectiveness of the nomogram, we applied the receiver operating characteristic (ROC) to evaluate the discriminative ability of the nomogram for the 5 year DFS. The results showed that the area under curve (AUC) values of Rad-score incorporating the TNM stage reached 0.734 and 0.86 in the primary and validation cohort, respectively, outperforming the result of using the TNM stage alone (Figure 4).

Radiomics Features Were Mainly Associated Metabolism-Related Pathway
To explore the association between radiomic features and the underlying biology mechanism, we performed the correlation analysis between the enrichment score of hallmarks and the 5 radiomics features. Gene expression data from 53 patients who have paired image data and RNA sequencing data were used to calculate the enrichment score of hallmarks by DeepCC. The pathways were selected according to the significant association with the radiomics signatures ( Figure 5). Typically, the radiomics signatures showed significant enrichment in some metabolic pathways, such as protein secretion, glycolysis, heme metabolism, xenobiotic metabolism, adipogenesis.

DISCUSSION
Medical image analysis is a popular issue for precision therapy, which provides non-invasive information for clinical practice and treatment guidance. However, traditional medical image analysis can only find low throughput features or qualitative information manually by radiologists. Recent progress in machine learning enables researchers to extract high dimensional data quickly and quantitatively by radiomics. In this study, we used the radiomic features extracted from the CT image to predict the outcome of CRC patients. Survival analysis showed that high radiomics score was significantly associated with poor outcomes. Univariable and multivariable analyses confirmed the independent prognostic value of radiomic signature. Subsequently, the radiomics based nomogram was developed to predict the DFS, which showed better performance than using the TNM stage alone. Correlation analysis with gene expression profiles revealed that radiomic signature was mainly associated with metabolism-related pathways. Taken together, our results suggested that radiomic signature could be a supplement to the TNM stage for risk stratification of CRC patients. Although the traditional gene expression-based molecular biomarkers have achieved good performance in many risks predicting tasks of colorectal cancer, there are still some difficulties that limit its clinical application (Walther et al., 2009;Kandimalla et al., 2018Kandimalla et al., , 2019. Genetic test not only requires additional cost and time but also depend on the postoperative detection on pathological samples, which may limit the preoperative treatment intervention. These problems can be avoided by using medical image-based biomarkers. Recent progress in deep learning has generated a series of the imagebased model with high accuracy and good performance (Kather et al., 2019;Lu et al., 2020;Skrede et al., 2020). However, a tricky problem of deep learning-based image model is the insufficiency of interpretation, which may raise concerns about its safety and limit its clinical application (Gordon et al., 2019). In contrast, radiomics is more interpretable and less dependent on sample size, which makes it easier to transform into clinical practice. Several studies have successfully use radiomics for individualized risk prediction of colorectal cancer (Liu et al., 2017(Liu et al., , 2020. Furthermore, integration analysis of radiomics and gene expression profiles can provide deeper biological interpretation. Our results showed that the radiomics signatures showed significant enrichment in some metabolic pathways, which is an important mechanism for colorectal cancer initiation and progression (Satoh et al., 2017;Gao X. et al., 2019;Tang et al., 2019;La Vecchia and Sebastián, 2020). This indicated that the change of tumor metabolic status may cause morphological change on the image, which could be captured by radiomics features.
Our study not only established a robust radiomics-based nomogram for prognosis prediction of CRC, but also provided biological interpretation by correlation with gene expression profiles. However, there are still some limitations in our study. For example, the image and clinical data are collected from a single center, which may challenge the generalization of our model. Besides, as a retrospective study, the evidence level might be not enough. Prospective multicenter validation would be needed in future studies.
In conclusion, we proposed the Rad-score extracted from CT images as an independent prognostic factor for colorectal cancer. We incorporated Rad-score with the TNM stage to build a nomogram, which outperformed than TNM stage alone, indicating that the Rad-score can be complementary to the current staging strategies of CRC patients. As a non-invasive biomarker, our radiomics-based model can also provide a way of preoperative evaluation, which is helpful for clinical intervention.

DATA AVAILABILITY STATEMENT
The data supporting the findings of this study are available upon request from the corresponding authors (FG). The image data are not publicly available because they contain information that could compromise patient privacy. Gene expression data of COCC project are not publicly available currently and will be released by ICGC after milestone.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Medical Ethics Committee of the Sixth Affiliated Hospital of Sun Yat-sen University. The ethics committee waived the requirement of written informed consent for participation.

AUTHOR CONTRIBUTIONS
FG and X-JW designed this study. DC, XD, and WW wrote the paper. DC analyzed and interpreted the data and drew the figures. XD extracted the radiomics features. WW, FG, and X-JW revised the paper. Z-PH, QZ, and M-EZ delineated the region of interest. M-YL, C-HL, and W-BK collected and cleaned up the clinical data. COCC Working Group provided assistance for data generation or analysis. All authors contributed to manuscript revision, read, and approved the submitted version.