A nomogram based on collagen signature for predicting the immunoscore in colorectal cancer

Objectives The Immunoscore can categorize patients into high- and low-risk groups for prognostication in colorectal cancer (CRC). Collagen plays an important role in immunomodulatory functions in the tumor microenvironment (TME). However, the correlation between collagen and the Immunoscore in the TME is unclear. This study aimed to construct a collagen signature to illuminate the relationship between collagen structure and Immunoscore. Methods A total of 327 consecutive patients with stage I-III stage CRC were included in a training cohort. The fully quantitative collagen features were extracted at the tumor center and invasive margin of the specimens using multiphoton imaging. LASSO regression was applied to construct the collagen signature. The association of the collagen signature with Immunoscore was assessed. A collagen nomogram was developed by incorporating the collagen signature and clinicopathological predictors after multivariable logistic regression. The performance of the collagen nomogram was evaluated via calibration, discrimination, and clinical usefulness and then tested in an independent validation cohort. The prognostic values of the collagen nomogram were assessed using Cox regression and the Kaplan−Meier method. Results The collagen signature was constructed based on 16 collagen features, which included 6 collagen features from the tumor center and 10 collagen features from the invasive margin. Patients with a high collagen signature were more likely to show a low Immunoscore (Lo IS) in both cohorts (P<0.001). A collagen nomogram integrating the collagen signature and clinicopathological predictors was developed. The collagen nomogram yielded satisfactory discrimination and calibration, with an AUC of 0.925 (95% CI: 0.895-0.956) in the training cohort and 0.911 (95% CI: 0.872-0.949) in the validation cohort. Decision curve analysis confirmed that the collagen nomogram was clinically useful. Furthermore, the collagen nomogram-predicted subgroup was significantly associated with prognosis. Moreover, patients with a low-probability Lo IS, rather than a high-probability Lo IS, could benefit from chemotherapy in high-risk stage II and stage III CRC patients. Conclusions The collagen signature is significantly associated with the Immunoscore in the TME, and the collagen nomogram has the potential to individualize the prediction of the Immunoscore and identify CRC patients who could benefit from adjuvant chemotherapy.


Introduction
The incidence rate of colorectal cancer (CRC) has gradually increased over the past decades and has become one of the leading causes of cancer burden and cancer deaths worldwide (1).Currently, the tumor-node-metastasis (TNM) staging system is widely utilized in the clinic as the reference standard for prognosis and treatment (2).Nevertheless, there is significant heterogeneity in the clinical outcomes of CRC patients with the same stage who receive a similar treatment regimen.This suggests that the current TNM staging system does not supply adequate prognostic and chemotherapy benefit information (3,4).Several studies have demonstrated that the tumor microenvironment (TME), including the extracellular matrix (ECM) and immune cells, intensely impacts tumor initiation, proliferation, invasion, and metastasis (5,6).Among the immune effector cells in the tumor, tumor-infiltrating lymphocytes (TILs) reflect the antitumor immune status of the host and are related to the prognosis and therapeutic response of CRC patients (7,8).The density of CD3+ and CD8+ T cells at the tumor center (TC) and invasive margin (IM) was quantified and scored, namely, the Immunoscore (9,10).Recently, several high-quality international studies have validated the prognostic value of the Immunoscore (11)(12)(13)(14).Thus, the Immunoscore has been described as a new element for the TNM staging system of CRC and is recommended by the NCCN guidelines (15).
Epithelial-mesenchymal transition (EMT) is known to enhance the migratory and invasive abilities of cancer cells, thereby facilitating tumor formation and metastasis (16).Collagen, as a major component of the extracellular matrix (ECM), is upregulated during the process of EMT under the influence of various transcription factors, such as Twist, Slug, Snail, and Zeb (17-19).Concurrently, the integrins a1b1 and a2b1, which interact with collagen and have been shown to mediate the degradation of epithelial cadherin complexes, are also upregulated (20).Previous research indicated that the interaction between cells and the ECM is regulated through ECM-binding proteins, such as SPARC, which promotes the interaction between collagen and a2b1 (21).SPARC has been demonstrated to induce EMT by regulating SLUG expression and is associated with increased invasiveness (22).Thus, under the influence of various biological signals, the structure of collagen undergoes dynamic changes during the development and progression of tumors (23,24).Collagen also plays a vital role in the localization, dynamic behavior, and function of TILs in the TME (25,26).However, the correlation between collagen structure alterations and the Immunoscore remains unclear.Multiphoton imaging, which is a nonlinear optical imaging method, can visualize collagen structure at the supramolecular level and is especially sensitive to collagen structure due to its physical basis (27).This technique has become a powerful tool for investigating the alteration of collagen structure during disease progression (28,29).Furthermore, our previous studies have established a robust framework that enables automatic high-throughput acquisition of fully quantitative collagen structure features for disease diagnosis and prediction (30)(31)(32).Therefore, we hypothesized that we could elucidate the relationship between collagen structure and Immunoscore in the TME of CRC patients using multiphoton imaging and collagen quantification analysis.
Integrating multiple biomarkers into a biomarker panel using a machine learning algorithm can significantly improve the prediction performance compared to individual biomarkers (33,34).Least absolute shrinkage and selection operator (LASSO) regression is an effective algorithm for analyzing high-throughput data and is widely accepted for model construction (35).Hence, we aimed to construct a fully quantitative collagen biomarker, i.e., a collagen signature, via multiphoton imaging and LASSO regression to comprehensively describe the correlation between collagen structure and the Immunoscore in the TME.Then, we investigated the potential predictive ability of a collagen nomogram that integrated the collagen signature and clinicopathological predictors for individualized prediction of Immunoscore in CRC patients.

Patients and tissue specimens
Ethics approval was obtained from the institutional review boards of NanFang Hospital and Fujian Provincial Cancer Hospital (NFEC-2023-221).The requirement for informed consent was waived for this study.The study was conducted following the guidelines of the Declaration of Helsinki and the Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement criteria.
The flow chart of patient recruitment in this study is shown in Supplementary Figure 1.The inclusion criteria were patients who underwent radical surgery with pathologically diagnosed stage I-III CRC, available follow-up data and clinicopathological characteristics, and hematoxylin and eosin (HE) slides with invasive tumor components.The exclusion criteria were patients with unavailable formalin-fixed paraffin-embedded (FFPE) specimens, a history of cancer, or received neoadjuvant treatment.As a result, a total of 327 consecutive patients were included in the training cohort between January 2011 and December 2013 from Nanfang Hospital.An independent validation cohort contained 327 consecutive patients from Fujian Provincial Cancer Hospital between October 2011 and December 2013.Two independent pathologists reassessed all samples based on the 8th edition AJCC staging criteria.
Clinicopathological characteristics included age, sex, primary tumor location, preoperative carcinoembryonic antigen (CEA) level, preoperative carbohydrate antigen 199 (CA199) level, tumor differentiation, tumor size, pT stage, and pN stage.Adjuvant chemotherapy after radical surgery is recommended for patients with high-risk stage II and stage III CRC according to NCCN guidelines.
A standardized follow-up protocol was implemented, including a serum CEA test every 3 months after surgery and every 6 months after 3 years; CT examination from chest to pelvis every 6 months in the first 5 years after surgery; and colonoscopy at 1 year after surgery.
The Immunoscore was assessed in the following steps (Figure 1).First, two pathologists who were blinded to the prognostic information selected five representative regions at the TC and five representative regions at the IM.Second, CD3+ and CD8+ stained immune cells were quantified using QuPath software (version 0.2.3).Third, CD3+ and CD8+ density was used to divide the individual cases into "high" or "low" immune groups, and patients with a mean density ≥ 75th percentile were considered a "high" immune group.A high immune group score was set as 1, and a low immune group score was set as 0. The CD3 TC , CD3 IM , CD8 TC , and CD8 IM scores were added and converted into an Immunoscore (I0 -I4).Finally, patients were divided into two groups based on their Immunoscore: I0-I1 was classified as low Immunoscore (Lo IS), and I2-I4 was classified as intermediate-high Immunoscore (Int-Hi IS).

Multiphoton imaging and collagen feature extraction
The regions at the TC and IM, which were used to calculate the density of CD3+ and CD8+, were used for multiphoton imaging.Flowchart for calculating Immunoscore.First, digital IHC images (CD3+ for example) were acquired and opened with Qupath software, and 5 representative images were randomly circled in the TC (orange) and IM (blue) regions (scale: 2,000 mm).Then, the densities of CD3+ (brown) in the CT and IM were counted by Qupath software (red), and the number of positive TILs was calculated per mm 2 , scale: 250 mm.The mean TIL density was used to divide the individual cases into "high" or "low" immune groups, and patients with a mean density ≥ 75th percentile were regarded as a "high" immune group.A high immune group score was set as 1, and a low immune group score was set as 0. The CD3 TC , CD3 IM , CD8 TC , and CD8 IM scores were added and converted into an Immunoscore (I0 -I4), where I0-I1 is a low Immunoscore (Lo IS) and 2-4 is an intermediate-high Immunscore (Int-Hi IS).TC, tumor center; IM, invasive margin; IHC, immunohistochemistry; TILs, tumor-infiltrating lymphocytes; IS, Immunoscore; Lo, low; Int-Hi, intermediate-high.
Image acquisition for multiphoton imaging was performed with a 200× original magnification objective on another unstained serial section and then compared with the HE image for histologic assessment (27).More information about the multiphoton imaging system can be found in the Supplementary Methods.
The framework we constructed for the quantitative extraction of collagen features is shown in the Supplementary Methods.In summary, 142 collagen features (Supplementary Table 1), including morphological features, histogram-based features, gray level concurrence matrix (GLCM) features, and Gabor wavelet transform features, were achieved automatically via MATLAB 2016b (Mathworks, Natick, MA, USA) (30)(31)(32).Finally, a total of 284 collagen features were obtained, including 142 from TC and 142 from IM, for further statistical analyses.

LASSO regression and collagen signature construction
LASSO regression, which is suitable for the regression of highdimensional data, was used to select the most useful predictive features (33)(34)(35).The LASSO regression used an L1 penalty to shrink the coefficients to zero.The penalty parameter l, also called the tuning constant, controls the number of collagen features to enter the model.In this study, we applied 10-fold cross-validations to select the optimal value of l via 1-standard error (SE) criteria in the training cohort, and the collage signature was calculated for each patient via a linear combination of selected features that were weighted by their respective coefficients in the training cohort.Then, the collage signature in the validation cohort was calculated by the selected features with their respective coefficients obtained from the training cohort.Details of the LASSO regression are provided in the Supplementary Methods.

Development and assessment of the collagen nomogram
The collagen signature and clinicopathologic characteristics were included in univariate analysis to investigate their association with Lo IS, and variables with P < 0.10 were included in multivariable analysis.A backward stepwise selection method with Akaike's information criterion as the stopping rule was used to select the independent predictors of Lo IS (38).To facilitate clinical application, we developed a collagen nomogram according to the independent predictors in the training cohort (39).
The Hosmer−Lemeshow test was applied to estimate the goodness of fit of the model (40).The multicollinearity of the collagen nomogram was evaluated through the variance inflation factor (VIF) (41).The area under the curve (AUC) and the calibration curve were applied to assess the discrimination and calibration of the collagen nomogram.Then, the collagen nomogram was performed in the validation cohort, and its AUC and calibration curve were acquired.More information on the nomogram is shown in the Supplementary Methods.

Clinical application value of the collagen nomogram
To assess the clinical application value of the collagen nomogram, a traditional model was developed for comparison with the collagen nomogram.In our study, the traditional model was constructed based on clinicopathological predictors after univariate and multivariable logistic regression in the training cohort.The clinical usefulness of the collagen nomogram was evaluated by decision curve analysis (DCA) and clinical impact curves (CICs) (42).The maximum Youden index value of the ROC curve of the two models was measured to estimate the specificity, sensitivity, accuracy, positive predictive value (PPV), and negative predictive value (NPV).Moreover, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) were used to show the improvement of the collagen nomogram compared with the traditional model (43,44).Details of the NRI and IDI are provided in the Supplementary Methods.

Statistical analysis
Baseline characteristics were compared between the training and validation cohorts by t test, U test, Fisher's exact test, and c2 test when applicable.The odds ratio (OR) and 95% confidence interval (CI) of the predictors were calculated using multivariable logistic regression.Survival curves were generated by using the Kaplan-Meier method and compared by log-rank tests.Univariate and multivariable analyses with Cox proportional hazards regression determined the hazard ratio (HR) of predictors for disease-free survival (DFS) and overall survival (OS).All statistical analyses were performed with SPSS version 22.0 software (IBM, Armonk, New York USA) and R version 4.0.3(http://www.r-project.org/).All P values were two-sided, and statistical significance was defined as P < 0.05.

Patient characteristics and immunoscore
The baseline characteristics of the patients in the training and validation cohorts are summarized in Table 1.A total of 421 (64.3%) patients were < 65 years old, with 405 (61.9%) men.The clinicopathological characteristics of the two cohorts were similar (Supplementary Table 2).
The density of CD3+ and CD8+ TILs in the TC and IM is shown in Supplementary Figure 2, with a higher density of TILs in the IM than in the TC for both CD3+ and CD8+ cells.The cutoff values of CD3+ and CD8+ cells were 593 and 382 cells/mm 2 in the TC and 1382 and 714 cells/mm 2 in the IM, respectively (Supplementary Table 3).Finally, the proportions of patients with Lo IS and Int-Hi IS were 34.3% and 65.7% in the training cohort and 35.8% and 64.2% in the validation cohorts, respectively.

Collagen signature construction
The framework of the collagen signature is presented in Figure 3.As a result, a collagen signature was constructed based on sixteen collagen predictors from 284 collagen features by LASSO regression.(Supplementary Figure 4).The calculation formula for the collagen signature is proposed in the Supplementary Results.
The distributions of the 16 collagen predictors and Immunoscore for each patient in the training and validation cohorts are shown in Supplementary Figure 5.The patients with a high collagen signature were more likely to show Lo IS in both cohorts (Figure 4).The collagen signature yielded an AUC of 0.896 (95% CI, 0.854-0.936) in the training cohort and 0.903 (95% CI, 0.863-0.944) in the validation cohort.A significant association between the collagen signature and Lo IS was found when stratified analysis was performed (Supplementary Tables 4, 5).We also assessed the performance of the collagen signature and the single selected collagen feature in predicting Immunoscore.The results indicated that the collagen signature was more powerful than any individual parameter, demonstrating the added predictive value of the collagen signature (Figure 5).

Development and validation of the collagen nomogram
Univariate and multivariable logistic regression was performed to identify independent predictors of Lo IS.The results showed that the collagen signature (OR: 4.632, 95% CI: 3.068-6.993;P < 0.001), tumor differentiation (OR: 2.537, 95% CI: 1.121-5.741;P = 0.026), pT stage (OR: 2.602, 95% CI: 1.106-6.121;P = 0.028), and pN stage (OR: 2.550, 95% CI: 1.197-5.433;P = 0.015) were independent predictors of Lo IS (Table 2).Then, the collagen nomogram was developed, integrating the above four predictors (Figure 6A).ROC curve analysis indicated that the collagen signature had the most discrimination ability compared with the other predictors (Supplementary Figure 6).Alluvial diagrams were employed to intuitively illustrate the relationship between the four predictors and Immunoscore (Supplementary Figure 7).The variance inflation factor (VIF) values of each predictor were < 10, indicating that there was no multicollinearity among the four predictors (Supplementary Table 6).The Hosmer−Lemeshow test yielded a nonsignificant statistic (P = 0.299), demonstrating that there was no departure from a perfect fit.

Clinical application value of the collagen nomogram
A traditional model was developed based on tumor differentiation, pT stage, and pN stage in the training cohort (Supplementary Table S7).The traditional model yielded AUCs of 0.683 (95% CI, 0.622-0.744) in the training cohort, 0.680 (95% CI, 0.619-0.742) in the validation cohort, and 0.685 (95% CI, 0.642-0.728) in all patients.The collagen nomogram exhibited better discrimination ability than the traditional model (training cohort: 0.925 vs. 0.683; validation cohort: 0.911 vs. 0.680; all patients: 0.918 vs. 0.685; all P < 0.001) (Figure 6B).Moreover, the stratified analysis showed that the collagen nomogram was still superior to the traditional model among the subgroups in the training cohort, the validation cohort, and all patients (Supplementary Figures S8-S10).DCA revealed that the collagen nomogram could add more benefits than the traditional model (Figure 7A).CICs were generated to intuitively recognize the application value of the collagen nomogram to more accurately identify patients with Lo IS (Figure 7B).

Association of the collagen nomogram with prognosis and chemotherapy benefits
Patients were divided into high-and low-probability Lo IS groups based on the ROC curve of the collagen nomogram.We found that patients with a low-probability Lo IS subgroup showed a better prognosis than patients with a high-probability Lo IS subgroup in the training cohort (Supplementary Figure S11A), the validation cohort (Supplementary Figure S11B) and all patients (Supplementary Figure S11C).This result was also observed in stage I-II (Supplementary Figure 12) and III patients (Supplementary Figure S13).Cox regression analysis demonstrated that the probability of Lo IS was an independent prognostic factor after adjusting for other variables in the training cohort [DFS: HR 2.475 (95% CI, 1.667-3.675),P < 0.001; OS: HR 2.179 (95% CI: 1.409-3.370),P < 0.001] (Supplementary Table 8), the validation cohort [DFS: HR 2.211 (95% CI, 1.510-3.239),P < 0.001; OS: HR 2.111 (95% CI: 1.366-3.262),P < 0.001] (Supplementary Table 9), and all patients [DFS: HR 2.350 (95% CI, 1.787-3.091),P < 0.001; OS: HR 2.119 (95% CI: 1.559-2.881),P < 0.001] (Supplementary Table 10).The collagen signature and clinicopathological predictors with the corresponding DFS and OS status are presented in Figure 8.In addition, we investigated the chemotherapy benefits of highrisk stage II and stage III CRC patients in the high-and lowprobability Lo IS subgroups.The results of the survival analysis showed that chemotherapy was associated with high-risk II and stage III CRC patients (Supplementary Figure S14).A test for an interaction between the probability of Lo IS and chemotherapy demonstrated that in either high-risk stage II or stage III, the benefit observed in the low-probability Lo IS patients [high-risk stage II (Figure 9): DFS, HR: 0.486 (95% CI: 0.280-0.842),P = 0.010; OS, HR: 0.441 (95% CI: 0.229-0.852),P = 0.015; stage III (Figure 10): DFS, HR: 0.464 (95% CI: 0.284-0.758),P = 0.002; OS, HR: 0.452 (95% CI: 0.266-0.770),P = 0.003; all P < 0.05 for interaction; Table 5] was superior to that observed in the high-probability Lo IS patients.The results indicated that chemotherapy significantly improved survival outcomes in the low-probability Lo IS group (high-risk stage II: P = 0.010 and P = 0.015; stage III: P = 0.002 and P = 0.003, respectively) but had no significant influence in the highprobability Lo IS group (high-risk stage II: P = 0.459 and P = 0.319; stage III: P = 0.535 and P = 0.449, respectively).

Discussion
In the current era of precision medicine, Immunoscore is a standard assay that quantifies the density of TILs, and its prognostic value has been internationally validated.In this study, we found a significant association between the collagen signature and the Immunoscore in the TME, and the collagen nomogram combining the collagen signature, tumor differentiation, pT stage, and pN stage could predict the Immunoscore with satisfactory performance.Moreover, the collagen nomogram was able to classify chemotherapy benefits in high-risk stage II and stage III CRC patients, indicating its potential as a tool to predict prognosis and facilitate treatment decision-making.
During tumor development, collagen in the extracellular matrix (ECM) undergoes notable remodeling, which affects the biological behavior of tumor cells, including infiltration, proliferation, and metastasis (18, 19).Importantly, collagen has also been found to influence various types of tumor-infiltrating immune cells (25,26).In 3D culture assays, T-cell migration was significantly slower in high-density collagen gels than in low-density collagen gels (45).Increased collagen density also results in increased matrix stiffness, which can further affect T-cell migration (46, 47).In addition, high collagen density can influence immunological synapse formation between T cells and antigen-presenting cells (48), leading to reduced T-cell activity (49,50).Collagen density has also been found to intensely affect the activity of T cells after the initial activation stage (51).These findings suggest that collagen has important immunomodulatory functions, which lays a foundation  for quantitatively analyzing the relationship between collagen structure and the Immunoscore in the TME.
Collagen is a noncentrosymmetric structure, and multiphoton imaging can provide detailed information about the structure and organization of collagen fibers in tissue (52,53).In this study, we acquired high-resolution multiphoton images from the TC and IM of the tumor sample.We then extracted quantitative highthroughput collagen features from the images using a robust framework, which could objectively quantify the collagen structural information contained in the TME.LASSO regression, an effective algorithm with variable selection and complexity regularization, was used to shrink and choose the most predictive collagen predictors from the high-throughput features to construct the collagen signature.Variable selection means selectively choosing variables in the model to achieve more satisfactory performance parameters, rather than including all variables in the model, while complexity regularization is retained through the penalty parameter l to avoid overfitting (35,54,55).Using this approach, the collagen signature, based on 6 collagen features from TC and 10 collagen features from IM, was developed and was significantly related to the Immunoscore.Our findings revealed that patients with a high collagen signature exhibited a low T-cell density microenvironment, resulting in Lo IS in CRC patients with poor prognosis, consistent with previous reports (10,12,13).Thus, the collagen signature could comprehensively and quantitatively determine the correlation between collagen structure and Immunoscore in the TME.Then, we constructed a collagen nomogram that included the collagen signature, tumor differentiation, pT stage, and pN stage.The collagen nomogram has better discrimination and clinical application value for estimating the Immunoscore than the traditional model.To the best of our knowledge, this is the first study to assess the association between the collagen structure and the Immunoscore in the TME and build an effective prediction model based on the fully quantitative collagen signature using multiphoton imaging.
From a clinical practice standpoint, the clinical translation of the collagen nomogram is feasible.First, the clinicopathological predictors required for the nomogram are routinely supplied in the p o s t o p e r a t i v e p a t h o l o g i c a l r e p o r t .S e c o n d , u n l i k e immunohistochemistry, which requires staining agents and is time consuming, multiphoton imaging can quickly image unstained sections in a label-free manner, and collagen feature extraction can be automatically completed using MATLAB software.Third, our study revealed a correlation between collagen structure and Immunoscore, indicating that future treatment might regulate collagen in the TME to potentially tune the antitumor immune status.Taken together, we believe that the collagen nomogram is both time efficient for pathologists and cost contained for patients while also providing a potential therapeutic target for improving the prognosis of CRC patients.
According to the NCCN guidelines, adjuvant chemotherapy is recommended for high-risk stage II and stage III CRC patients.However, not all patients can benefit from chemotherapy.Previous clinical trials have shown that patients with Lo IS could not benefit from chemotherapy, while patients with Hi-Int IS could improve their prognosis from chemotherapy; therefore, the Immunoscore is useful for the selection of individualized chemotherapy (12,13).Because the collagen nomogram demonstrated satisfactory performance in predicting Lo IS, we further evaluated whether the collagen nomogram can identify patients who could benefit from chemotherapy.Patients were divided into high-and lowprobability Lo IS groups according to the collagen nomogram.The results showed that patients with a low-probability Lo IS could benefit from chemotherapy, while patients with a high-probability Lo IS could not.This finding suggests that the collagen nomogram could be a potential tool to assist in individualizing chemotherapy selection in high-risk stage II and stage III CRC patients when Immunoscore evaluation is not feasible.
Artificial intelligence (AI) technologies, especially deep learning, have advanced rapidly in medical care, providing powerful methods for constructing accurate prediction models (56,57).AI has demonstrated comparable performance to pathologists in distinguishing between benign and malignant colorectal diseases (58).Although this approach cannot entirely supplant the role of pathologists, AI can be harnessed as an assistive tool to improve diagnostic efficiency, reduce workload, and improve medical image readability, ultimately reducing the rates of misdiagnosis and missed diagnoses (59).Furthermore, a multistain deep learning model based on AI could also be used to determine the AImmunoscore (AIS) in CRC patients and predict the response to neoadjuvant therapy in rectal cancer patients (60).The potential of AI to revolutionize the clinical landscape of CRC is Clinical application value of the nomogram.(A) Decision curve analysis for the nomogram.The y-axis represents the net benefit, and the x-axis represents the different threshold probabilities.(B) Clinical impact curves for the nomogram.Of 1,000 patients, the red line shows the total number of patients who would be deemed to have a low Immunoscore for each threshold probability.The black line shows how many of those would be true positives (cases).The closer the curves are, the higher the probability that the nomogram would identify low Immunoscore patients from the total estimated number of low Immunoscore patients.substantial.However, it is important to recognize that AI is still in its early stages of clinical application in CRC.Several challenges that must be addressed include the validation and generalizability of the predictive models, interpretation of the model, and the safe management and use of data.We believe that in the future, AI technologies will assume a considerably more prominent role in the context of screening, diagnosis, surgical treatment, and prognosis prediction.
Our study has some limitations.First, this was a retrospective study.Second, all specimens were obtained from a single medical center in China.Hence, multicenter, international, prospective clinical trials will be necessary to validate the robustness of the collagen nomogram.Third, the probability of Lo IS based on the collagen nomogram was associated with survival; however, additional survival parameters were not added to our nomogram for model accuracy estimation.In conclusion, this study proposed that the collagen signature was significantly associated with the Immunoscore in the TME and that the collagen nomogram is useful for the individualized prediction of the Immunoscore in CRC patients.Moreover, the collagen nomogram could be a potential tool to assist in individualizing chemotherapy selection in high-risk stage II and stage III CRC patients.

2
FIGURE 2 Kaplan−Meier survival analysis of the training and validation cohorts grouped by Immunoscore.(A) The 5-year DFS and OS comparison between the Lo and Int-Hi IS groups in the training cohort.(B) The 5-year DFS and OS comparison between the Lo and Int-Hi IS groups in the validation cohort.Lo, low; Int-Hi, intermediate-high; IS, Immunoscore; DFS, disease-free survival; OS, overall survival; HR, hazard ratio.
FIGURE 3 Construction framework of the collagen signature.(A) Selection of the region of interest in the TC and IM by comparing HE staining and multiphoton imaging.Ten regions (five at the TC and five at the IM) per sample are used for multiphoton imaging.Scale bars: 2,000 mm and 200 mm, respectively.(B) Framework for constructing the collagen signature.SHG images were converted to binary images for collagen feature extraction.The collagen signature was constructed using LASSO regression from 284 collagen features (142 from the TC and 142 from the IM).Then, the relationship between the collagen signature and the Immunoscore was evaluated and validated.HE, hematoxylin and eosin; TPEF, two-photon excitation fluorescence; SHG, second harmonic generation; GLCM, gray-level cooccurrence matrix; LASSO, least absolute shrinkage and selection operator; TC, tumor center; IM, invasive margin.

4 FIGURE 5 ROC
FIGURE 4 Distribution of the collagen signature in the training and validation cohorts.(A) Collagen signature for each patient in the training cohort.(B) Collagen signature for each patient in the validation cohort.Red represents the Lo Immunoscore, and blue represents the Int-Hi Immunoscore.Lo, low; Int-Hi, intermediate-high; IS, Immunoscore.

6
FIGURE 6 Collagen nomogram construction and performance assessment.(A) The collagen nomogram was constructed in the training cohort, incorporating the collagen signature, tumor differentiation, pT stage, and pN stage.(B) The ROC curves of the nomogram and the traditional model in the training cohort, the validation cohort, and all patients.(C) The calibration curves of the nomogram in the training cohort, the validation cohort, and all patients.In the calibration curve, the y-axis represents the actual Lo IS probability, and the x-axis represents the predicted Lo IS probability.The diagonal black dotted line represents a perfect prediction model.The solid red line is a representation of the nomogram; better prediction is indicated when the solid red line has a closer fit to the diagonal black dotted line.AUC, area under the curve; CI, confidence interval; Lo IS, low Immunoscore.

8
FIGURE 8 Distribution of the nomogram-predicted subgroups with the corresponding survival status in all patients.(A) Nomogram-predicted probability of Lo IS distribution; (B) Disease-free survival status of all patients; (C) Overall survival status of all patients.(D) Distribution of the collagen signature and clinicopathological predictors with the corresponding survival status.Lo IS, low Immunoscore; Int-Hi IS, intermediate-high Immunoscore.

9 10
FIGURE 9 Adjuvant chemotherapy benefits in high-risk stage II CRC patients.(A) DFS and (B) OS comparison of high-risk stage II CRC according to the receipt of adjuvant chemotherapy in patients with a low-probability Lo IS. (C) DFS and (D) OS comparison of stage high-risk stage II CRC according to the receipt of adjuvant chemotherapy in patients with a high-probability Lo IS.Lo IS, low Immunoscore; CRC, colorectal cancer; DFS, disease-free survival; OS, overall survival; CT, chemotherapy.

TABLE 1
Clinicopathological characteristics of the patients in the training and validation cohorts.Values in parentheses are percentages unless indicated otherwise.The P value was derived from the univariable association analyses between each of the clinicopathological characteristics and IS.Lo, low; Int-Hi, intermediate-high; IS, Immunoscore; IQR, interquartile range; CEA, carcinoembryonic antigen; CA199, carbohydrate antigen 199.

TABLE 2
Univariate and multivariable analyses of the predictors of Lo IS in the training cohort.

TABLE 3
Predictive power of Lo IS between the nomogram and traditional model.
Lo IS, low Immunoscore; AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value.

TABLE 4
NRI and IDI test for prediction of Lo IS improvements of the nomogram compared with the traditional model.Lo IS, low Immunoscore; CI, confidence interval; NRI, net reclassification improvement; IDI, integrated discrimination improvement.