Construction and validation of a nomogram for predicting overall survival of patients with stage III/IV early−onset colorectal cancer

Purpose This study aimed to identify prognostic factors and develop a nomogram for predicting overall survival (OS) in stage III/IV early-onset colorectal cancer (EO-CRC). Methods Stage III/IV EO-CRC patients were identified from the Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2015. The datasets were randomly divided (2:1) into training and validation sets. A nomogram predicting OS was developed based on the prognostic factors identified by Cox regression analysis in the training cohort. Moreover, the predictive performance of the nomogram was assessed using the receiver operating characteristic (ROC) curves, calibration plots, and decision curve analysis (DCA). Subsequently, the internal validation was performed using the validation cohort. Finally, a risk stratification system was established based on the constructed nomogram. Results Of the 10,387 patients diagnosed with stage III/IV EO-CRC between 2010 and 2015 in the SEER database, 8,130 patients were included. In the training cohort (n=3,071), sex, marital status, race/ethnicity, primary site, histologic subtypes, grade, T stage, and N stage were identified as independent prognostic variables for OS. The 1-, 3-, and 5-year area under the curve (AUC) values of the nomogram were robust in both the training (0.751, 0.739, and 0.723) and validation cohorts (0.748, 0.733, and 0.720). ROC, calibration plots, and DCA indicated good predictive performance of the nomogram in both the training and validation sets. Furthermore, patients were categorized into low-, middle-, and high-risk groups based on the nomogram risk score. Kaplan-Meier curve showed significant survival differences between the three groups. Conclusion We developed a prognostic nomogram and risk stratification system for stage III/IV EO-CRC, which may facilitate clinical decision-making and individual prognosis prediction.


Introduction
Colorectal cancer (CRC) is the second most deadly and third most common cancer worldwide (1).While CRC incidence has decreased in individuals ≥50 years of age, it has increased globally in individuals younger than 50 years in the past decades, which has been defined as early-onset CRC (EO-CRC) (2-7).EO-CRC has been in the spotlight recently, which would account for about 11% of colon cancers and 23% of rectal cancers in 2030 (8).Unfortunately, the reasons of this increase in EO-CRC remain unclear and probably multifactorial (9).Additionally, clinical features of EO-CRC, often diagnosed with advanced stage disease (6,(10)(11)(12), differ from those of later-onset disease (3,4).Overall, EO-CRC contributes significantly to the global cancer burden.Hence, to facilitate clinical decision-making, it is important to predict the prognosis of EO-CRC patients.
Although widely used to examine the survival of CRC, the American Joint Committee on Cancer (AJCC) staging system is far from perfect.Several studies have identified prognostic risk factors for EO-CRC and developed the prognostic nomograms.However, some problems remain.First, flow chat was not available in two studies (13,14).Second, three nomograms (15)(16)(17) were associated with too many prognosis factors (≥ 12), which reduced its practicability.Last, and most importantly, several nomograms were unreasonable in clinical context.For example, it was paradoxical that patients with grade I had a worse prognosis than those with grade II (15,16,18).Similar to the "grade paradox", there were also "race/ethnicity paradox" (13,14,19,20), "T stage paradox" (16,18,19), "histologic subtypes paradox" (21), and "primary site paradox" (15,18,21,22).Thus, more high-quality research is urgently needed.
To the best of our knowledge, to date, no nomogram has been constructed to predict the prognosis of patients with stage III/IV EO-CRC.Therefore, this study aimed to develop and validate a prognostic nomogram predicting overall survival (OS) for stage III/ IV EO-CRC based on the Surveillance, Epidemiology, and End Results (SEER) database.
The inclusion criteria were patients diagnosed with stage III/IV EO-CRC (pathologically confirmed) between 2010 and 2015.Exclusion criteria included patients younger than 18 years, and those lacking complete clinicopathological and survival information.

Data collection
The collected variables included age, sex, marital status, race/ ethnicity, primary site, histologic subtypes, grade, AJCC stage, T stage, N stage, survival time, and vital status record.Primary sites comprise right-sided colon (cecum, ascending colon, hepatic flexure, and transverse colon), left-sided colon (splenic flexure, descending colon, sigmoid colon, and rectosigmoid junction), and rectum (23).Race/ethnicity was divided into five categories: Hispanic, Non-Hispanic White (NHW), Non-Hispanic Black (NHB), Non-Hispanic Asian or Pacific Islander (NHAPI), and Non-Hispanic American Indian/Alaska Native (NHAIAN).The tumor-node-metastasis (TNM) stage was determined according to the AJCC seventh edition criteria.OS is defined as the time from diagnosis to death from any cause or the time of the last follow-up.

Development of the prognostic nomogram
The datasets were randomly divided (2:1) into the training and validation sets.In the training cohort, univariate and multivariate Cox regression analyses were performed to identify the prognostic factors of patients with stage III/IV EO-CRC.The independent risk factors were used to construct the prognostic nomogram.

Validation of the prognostic nomogram
The discrimination ability of the prognostic nomogram was examined by the receiver operating characteristic (ROC) curves and the area under the curves (AUC).The calibration ability was evaluated by the calibration plot.Additionally, decision curve analysis (DCA) was performed to assess clinical utility by quantifying the net benefits at different threshold probabilities.

Risk stratification
Based on the constructed nomogram, the total risk score was calculated for each patient.X-tile software (version 3.6.1,Yale University) was used to identify the optimal cutoff values for the total risk score.According to the cutoff values of risk score, patients were classified into low-, middle-, and high-risk groups.Kaplan-Meier curves and the log-rank test were used to compare the survival differences between different groups.

Statistical analysis
R software (version 4.2.1, https://www.r-project.org) and relevant packages were used to construct and validate the prognostic nomogram.A p-value < 0.05 was considered significant.

Patient characteristics
Overall, 10,387 patients with stage III/IV EO-CRC were identified between 2010 and 2015 from the SEER database.According to the inclusion and exclusion criteria, 8,130 patients with stage III/IV EO-CRC were included in the final analysis.The study screening flow chart is shown in Figure 1.Subsequently, the datasets were randomly divided (2:1) into a training set (n=3071) and a validation set (n=1535), with no statistical difference.The demographic and clinicopathological characteristics of stage III/IV EO-CRC patients are summarized in Table 1.The baseline characteristics of patients stratified by AJCC stage are shown in Supplementary Table S1.

Development of the prognostic nomogram
The univariate and multivariate Cox regression analysis revealed that sex, marital status, race/ethnicity, primary site, histologic subtypes, grade, T stage, and N stage were identified as independent prognostic factors for OS in the training cohort (Table 2).Accordingly, these independent prognostic factors were utilized to construct the nomogram for predicting the 1-, 3-, and 5year OS of stage III/IV EO-CRC patients (Figure 2).

Validation of the prognostic nomogram
The ROC curves demonstrated that the discriminating ability of the prediction model was robust both in the training and validation sets.The AUC values of the nomogram predicting 1-, 3-, and 5-year OS in the training cohort were 0.751, 0.739, and 0.723, respectively.Similarly, in the validation cohort, the corresponding values were 0.748, 0.733, and 0.720, respectively (Figure 3).Additionally, the calibration plots demonstrated the prognostic nomogram's strong calibration capability (Figure 4).Furthermore, the DCA curves suggested good clinical utility (Figure 5).

Risk stratification
Based on the optimal cutoff values for the total risk score, patients in the training cohort were grouped into the low-risk (≤95.5),middle-risk (95.5-162.5),and high-risk groups (>162.5)(Supplementary Figure S1).Similarly, patients in the validation cohort were categorized into low-risk (≤96), middle-risk (96-162.5),and high-risk groups (>162.5)(Supplementary Figure S2).Kaplan-Meier curves, using the log-rank test, revealed significant differences in OS among the three risk subgroups in both the training and validation cohorts (Figure 6).

Discussion
Based on the SEER database, 8,130 stage III/IV EO-CRC patients were included.The present study identified that eight variables, including sex, marital status, race/ethnicity, primary site, histologic subtypes, grade, T stage, and N stage, were independent predictors for OS of patients with stage III/IV EO-CRC.Utilizing these variables, a nomogram with favorable performance predicting 1-, 3-, and 5-year OS was developed and validated.Furthermore, a risk stratification was successfully established based on the total risk score determined by the nomogram.
The initial nomogram demonstrated that AJCC stage had the most substantial impact on OS of the patients with stage III/IV EO-CRC.Based on our previous nomogram, risk score for stage III and IV is 0 and 100, respectively.This limits its clinical application.Therefore, AJCC stage was not included when constructing the new nomogram.Our new nomogram demonstrated that T stage contributed the most to prognosis, followed by N stage, grade, race, marital status, histologic subtypes, primary site, and sex.All eight prognostic variables are readily available and clinically reasonable.Our findings suggest that right colon cancer had a worse OS than rectum cancer, and rectum cancer had a worse OS than left colon cancer in stage III/IV EO-CRC.These results are in accord with published studies (16,22).Consistent with the literature (13,15,16,(18)(19)(20)24), this study showed that age was not a significant prognostic factor for the survival of patients with stage III/IV EO-CRC.However, a few studies have presented contrasting views.A survival nomogram for stage I-III EO-CRC patients revealed that patients with older age had a worse survival than those with younger age (14).In contrast, another nomogram for combined lymphatic metastases in EO-CRC patients demonstrated that patients with younger age had a worse survival than those with older age (21).These divergent findings necessitate cautious interpretation.
Our study demonstrated that sex had a minor influence on prognosis.These results are in accord with three studies (20,22,25) indicating that male EO-CRC patients exhibited slightly poorer survival compared to their female counterparts.However, several studies (13,15,16,18,19,24) indicated that sex was not a significant prognostic factor for the survival of EO-CRC patients.Collectively, the influence of sex on the survival of EO-CRC patients warrants further investigation.
In the present study, race/ethnicity was divided into five categories: Hispanic, NHW, NHB, NHAPI, and NHAIAN.Our nomogram showed that NHW patients had the best prognosis in stage III/IV Nomogram for predicting 1-, 3-, and 5-year OS for patients with stage III/IV EO-CRC.

B A
The ROC curves of the nomogram predicting OS in the training cohort (A) and validation cohort (B).(27)(28)(29)(30)(31)(32).Our results also support this view.Generally speaking, a higher T stage means a worse prognosis (13,14,21).Surprisingly, our study indicated that, compared to T1 patients, those with T2 (adjusted hazard ratio [aHR], 0.28; 95% confidence interval [CI], 0.21-0.36),T3 (aHR, 0.52; 95% CI, 0.45-0.61)and > T4 (aHR, 1.07; 95% CI, 0.91-1.26)had better OS independent of other variables.This seems a rather paradoxical finding.Nevertheless, we believe this to be an important clinical finding rather than a paradox.The most possible reason is that advanced tumors with light intestinal wall invasion may represent a biologically aggressive phenotype.More studies are needed to elucidate the underlying mechanism.
Similar to T stage, a higher N stage means a worse prognosis (14,20,24).In contrast, our study suggested another paradoxical finding.Compared to N0 patients, those with N1 (aHR, 0.36; 95% CI, 0.32-0.42),and N2 (aHR, 0.57; 95% CI, 0.50-0.65)had a better prognosis.The participants of our study were patients with stage III/IV EO-CRC.Therefore, the patients either present with lymph node metastasis or distant metastasis.Patients with N1 and N2 may not have distant metastasis, while patients with N0 must have distant metastasis.Thus, although paradoxical, this finding is logical in stage III/IV EO-CRC.
A higher tumor grade is commonly associated with a poorer prognosis, as confirmed in our study.However, a nomogram predicting OS for metastatic EO-CRC revealed counterintuitive trends: grade I patients had a worse prognosis than grade II, and grade III worse than grade IV (15).Clearly, these results are illogical.Based on their nomogram, the risk scores for grade I, II, III, and IV were 20, 0, 81, and 60, respectively.In the training cohort, the sample size for patients with grade I, II, III, and IV were 72, 1212, 378, and 67, respectively.We speculate that the small sample size for patients with grade I and IV (<100) may lead to the "grade paradox".Notably, several other studies were also limited by "grade paradox" (16,18,21).We speculated that this survival paradox resulted from the small sample sizes.
There are several strengths to the present study.First, to the best of our knowledge, this is the first predictive nomogram focusing on patients with stage III/IV EO-CRC.Second, the AUC values of the nomogram predicting 1-, 3-, and 5-year OS were all greater than 0.71.The calibration plots showed a good calibration ability and the DCA curves indicated a good clinical utility.Moreover, the internal validation also demonstrated satisfactory results.Third, of the 10,387 patients with stage III/IV EO-CRC, only 2257 patients (lacking essential clinical and survival information) were excluded.Thus, 8,130 patients were included in the final analysis, suggesting a strong representativeness.Fourth, based on the nomogram, a risk stratification system was successfully established to identify high risk patients.
Of course, some limitations must be recognized in the present study.First, it was a retrospective study, limiting the generalizability of the results.In the future, prospective studies are warranted to verify the findings.Second, there was no detailed information on CEA, radiotherapy, chemotherapy, and surgical treatment in the SEER database.Third, the SEER database did not include potential prognostic factors for EO-CRC, such as systemic immune inflammation index (33), geriatric nutrition risk index (33), symptom duration of 3 months or more (34), carbohydrate antigen 19-9 (35), and cell cycle-related genes (36).Fourth, our data was exclusively from the SEER database, representing only a portion of the US population, potentially limiting the applicability of our results to other regions or countries.Last, it should be stated that although the internal validation demonstrated satisfactory results, external validation was not performed.This problem will be addressed in our future studies.Patients with EO-CRC differ from those with later-onset CRC in underlying molecular mechanisms (37-40), clinical features, and treatment (41).This special population warrants further attention in the future.

Conclusions
Independent risk factors for OS in stage III/IV EO-CRC patients included sex, marital status, race/ethnicity, primary site, histologic subtypes, grade, T stage, and N stage.An effective nomogram and risk stratification system were established, potentially enhancing clinical decision-making and individual prognosis prediction.

FIGURE 1 Flow
FIGURE 1Flow chart of study screening.

4 5
FIGURE 4The calibration plots of the nomogram predicting 1-,3-and 5-year OS in the training cohort (A-C) and validation cohort (D-F).

6
FIGURE 6 Survival analysis of the training cohort (A) and the validation cohort (B) by the risk score calculated by the nomogram.

TABLE 1
Demographic and clinical characteristics of patients with stage III/IV EO-CRC.

TABLE 2
Univariate and multivariate analysis of OS in the training cohort.