Prognostic Nomograms for Predicting Overall Survival and Cancer-Specific Survival of Patients With Early Onset Colon Adenocarcinoma

Background The incidence of colon cancer in young patients is on the rise, of which adenocarcinoma is the most common pathological type. However, a reliable nomogram for early onset colon adenocarcinoma (EOCA) to predict prognosis is currently lacking. This study aims to develop nomograms for predicting the overall survival (OS) and cancer-specific survival (CSS) of patients with EOCA. Methods Patients diagnosed with EOCA from 2010 to 2015 were included and randomly assigned to training set and validation set. Cox regression models were used to evaluate prognosis and identify independent predictive factors, which were then utilized to establish the nomograms for predicting 3- and 5-year OS and CSS. The discrimination and calibration of nomograms were validated using the calibration plots, concordance index, receiver operating characteristics curve, and the decision curve analysis. Results A total of 2,348 patients were screened out, with 1,644 categorized into the training set and 704 into the validation set. Multivariate analysis demonstrated that gender, age, tumor size, T stage, M stage, regional node, tumor deposits, lung metastasis and perineural invasion were significantly correlated with OS and CSS. The calibration plots indicated that there was good consistency between the nomogram prediction and actual observation. The C-indices for training set of OS and CSS prediction nomograms were 0.735 (95% CI: 0.708–0.762) and 0.765 (95% CI: 0.739–0.791), respectively, whereas those for validation set were 0.736 (95% CI: 0.696–0.776) and 0.76 (95% CI: 0.722–0.798), respectively. The results of ROC analysis revealed the nomograms showed a good discriminate power. The 3- and 5-year DCA curves displayed superiority over TNM staging system with higher net benefit gains. Conclusions The nomograms established could effectively predict 3- and 5-year OS and CSS in EOCA patients, which assisted clinicians to evaluate prognosis more accurately and optimize treatment strategies.


INTRODUCTION
Colon carcinoma is the most common malignant tumor of the digestive tract, ranking fourth in deaths from malignant tumors worldwide. In the United States, it is estimated that approximately 104,610 colon cancer cases will be diagnosed in 2020, which corresponds to 287 new cases diagnosed per day on average (1). Among all histological subtypes, colon adenocarcinoma (CA) is deemed as the most common one, accounting for 60%-70% of all cases with a poor prognosis. Although, the diagnostic methods and therapeutic approaches for the management of CA have been greatly improved in recent years, the 5-year overall survival rate remains low. Meanwhile, tumor recurrence is also one of the most daunting challenges in the clinical treatment for CA (2). A previous study reported about 70% of CA patients exhibited postoperative recurrence within 24 months after curative surgery (3). In addition, evidence from several studies showed that CA incidence varied with age. Cancer facts and figures (2020) estimated that the incidence of CA had been increasing in young adults while the overall incidence declined by 3.6% per year for older adults (≥55 years) over the last 25 years. According to the data from US National Cancer Database, the incidence increased by 2.7% annually among adults younger than age 50 in the past decade, with 75% of cases occurring aged 40 to 49. Early onset colon adenocarcinoma (EOCA) is defined as CA patients under the age of 50 at diagnosis (4). Research suggests that EOCA may share biological characteristics including poorly differentiated, highly malignant, more aggressive, mutations in mismatch repair (MMR) genes as well as high microsatellite instability (MSI-H), resulting in unfavorable prognosis (5). Concerns have been raised over the increasing incidence and the poor clinical outcomes, and it is essential to precisely identify the prognostic factors associated with EOCA and choose personalized treatment strategies.
Nomogram is widely used as a visualization method of complex mathematical models, which considers multiple risk factors, predicts the prognosis of diseases, and presents them in an intuitive way (6). However, few studies have focused specifically on the age-specific risk factors associated with prognosis. A well-structured and fully validated prognostic nomogram for EOCA patients is desired. Hence, based on sufficient registered cases from the Surveillance, Epidemiology and End Results (SEER) database, this study first delineates the major clinical and pathological characteristics of EOCA, and then establishes nomograms to predict 3-and 5-year overall survival (OS) and cancer-specific survival (CSS).

Data Retrieved From SEER
Clinicopathological characteristics and information of all EOCA patients were obtained from the Surveillance Epidemiology and End Results (SEER) database via reference number 12330-Nov2019. Supported by the National Cancer Institute, the SEER program comprehensively assembles information on cancer incidence, treatment, and patient survival since 1973 in multiple geographic regions across the United States. An ethics statement or approval is not necessary for the presented study since all of the data are publicly available and open-access. The identification of colon adenocarcinoma patients is based on the histologic/behavior code of ICD-O-3 (International Classification of Disease for Oncology, Third Edition), primary site code C18.0-C18.9, along with the cancer staging scheme (version 0204). The inclusion criteria of this study were: i) age ≤ 50 years old; ii) no missing TNM stage information; iii) with histologically proven adenocarcinoma of the colon; iv) a single primary tumor lesion (CC); v) no missing information on survival, tumor size, grade and other details; vi) not only diagnosed through autopsy or a death certificate; vii) surgery had been performed. All of included samples were randomly split into the training set and the validation set, according to the ratio 7:3. The follow-up period for entire cohort ranged from less than 1 month to 95 months (median 45, average 49.2 months). The median follow-up time was 45 months in training set and 45.5 months in validation set, respectively.

Clinical Variables of EOCA
The demographic and clinical variables were extracted by the SEER * Stat software (version 8.3.5), including gender, age, race, grade, tumor size, American Joint Committee on Cancer (AJCC) TNM stage, regional node, tumor deposits, perineural invasion, regional nodes status, tumor metastasis, and survival related information and cause of death. The primary endpoint was overall survival (OS), defined as the period between initial diagnosis and final follow-up or death from any cause. The second endpoint was cancer-specific survival (CSS), defined as the period from the EOCA diagnosis to the death attributed to cancer recurrence or metastasis. Age and tumor size were divided into 3 groups using the optimal cut-off value, established by Xtile bioinformatics software (Yale University, Version 3.6.1).

Construction and Validation of Nomogram Model
The survival analysis was conducted with Kaplan-Meier method and log-rank test, while the Chi-square test was utilized for the comparison of categorical variables. Univariate Cox analysis was performed as a screening method to identify significant factors (P<0.2) for further multivariate testing. The nomogram was constructed to predict personalized survival probability based on the results from the multivariate analysis. Harrell's concordance statistics (C-index) was applied to evaluate the discriminatory ability of the nomogram. Based on the above estimation, receiver operating characteristic (ROC) curves were drawn and their corresponding areas under the curve (AUC) were also calculated. To further assess model calibration, the calibration plot was undertaken for the measurement between observed and predicted probabilities, with a 45-degree reference line. In addition, clinical usefulness of the nomogram models was determined using decision curve analysis (DCA) to quantify net benefit, and compared with the 7th version of TNM staging throughout the entire cohort. All the data analysis was carried out using R Software (Version 4.0.1, R Foundation for Statistical Computing). Statistically significant difference was set at P value < 0.05. However, the p-value level of 0.2 was regarded as filter value for univariate to multivariate analysis.

Input Data From SEER
In this process, a total of 2,348 patients with EOCA were screened out, of which 1,644 were assigned randomly to the training set and 704 cases were assigned to the validation cohort ( Figure 1). Among all patients, 1,189 (50.6%) were male and 1,646 (70.1%) were the white. The most appropriate cutoff value regarding age and tumor size was selected after optimized classification by the biostatistical tool X-tile. Among the included cases, 1,425 (60.7%) were between 38-47 years old, and 1,160 (49.4%) with tumor size larger than 4.7 cm. The majority of grade is moderately (75.0%) while 83.2% were in M0 stage. The positive rate of perineural invasion was only 15.1% (negative: 84.9%) of all patients, while tumor deposits was only positive in 11.5% of all patients (negative: 88.5%). In addition, about half of the cases are regional nodes positive (52.8%). The distant metastasis occurs not often, the most common organ of metastasis is the liver (11.8%), followed by the lung (2.6%) and the bone (0.1%) ( Table 1).

Construction of Nomogram
In the univariate COX analysis, the variables, including gender, age, tumor size, T stage, regional node, tumor deposits, lung metastasis, and perineural invasion, showed different statistic correlation with OS in EOCA patients. After adjusting for covariates, all factors listed above except age were significantly identified with OS in the multivariate COX regression ( Table 2). The OS nomogram for predicting 3-, and 5-year overall survival rate was established by incorporating these seven independent factors ( Figure 2). Moreover, univariate analysis demonstrated that gender, age, tumor size, T stage, M stage, regional node, tumor deposits, lung metastasis, and perineural invasion had a prominent impact on CSS in EOCA patients. These factors were subsequently included in the multivariate analysis, which showed similar results. Gender, age, tumor size, T stage, regional node, tumor deposits and lung metastasis were independently predictive of CSS and further subject to a CSS nomogram ( Table 3, Figure 2).

Nomogram Validation
The performance of nomograms was validated both internally and externally. When subjected to the internal validation, the nomogram exhibited predictive accuracy with C-index of 0.735 (95% CI: 0.708-0.762) for OS, and 0.765 (95% CI: 0.739-0.791) for CSS. In the external validation, the C-index for the OS nomogram was 0.736 (95% CI: 0.696-0.776), while for the CSS nomogram 0.76 (95% CI: 0.722-0.798). For the TNM staging system, the C-index to predict OS and CSS in the internal validation was 0.686 (95% CI: 0.662-0.711) and 0.712 (95% CI: 0.689-0.735), respectively. While in the external validation, the TNM staging system had a C-index of 0.68 (95% CI: 0.643-0.717) and 0.714 (95% CI: 0.695-0.733) to predict OS and CSS respectively, which indicated that the nomogram had better discriminatory ability than the traditional TNM staging system did. The calibration plots for the probability of 3-year and 5year overall survival rate illustrated a fair agreement between the predicted probabilities and the observed proportions (Figures 3, 4). The acceptable AUC values for the ROC curves were also noticed for prediction performance evaluation in training and validation sets, respectively ( Figure 5). On decision curve analysis, the results indicated that nomograms showed a comparable clinical net benefit similar to 7th edition AJCC stage. The decision curve analysis was a novel evaluation method that assessed the clinical usefulness across different predictive models. In both the training and validation sets, OS nomogram displayed the better clinical net benefit almost over the entire range of threshold probabilities, while CSS nomogram was superior to TNM stage for both the training and validation sets when the threshold probability is greater than 26% ( Figure 6).

DISCUSSION
The presented study developed OS and CSS prognostic nomograms for EOCA patients derived from the public database SEER. Through internal validation with bootstrap method and external validation, these nomogram models displayed favorable discrimination and calibration and comparable predictive performance to the TNM stage. The prognostic nomograms provided an alternative and complementary tool which would aid medical decision-making and follow-up scheduling as well as patient counseling. Our study extracted 2,348 eligible patients with EOCA from the SEER program which was a large population retrospective database. The patients were limited to those diagnosed between 2010 and 2015 considering the long-time span may have a certain impact on results. On the one hand, elderly patients with colon cancer  are characterized by a significant decline in morbidity and potential mortality. This may lead to confounding biases in general prognostic indicators, especially when focusing on EOCA. On the other hand, the therapeutic strategies of colon cancer have been well standardized and improved over time, particularly the new breakthroughs of targeted therapy and immunotherapy (7).
We chose to focus on the nomogram of EOCA due to the following reasons. Young patients with colon cancer is a distinctive but common subset and the most frequent histological subtype being adenocarcinoma. The recent investigation found that young individuals under age 50 with colon cancer has shown a startling upward trend in need of greater emphasis and research (4). A previous study demonstrated that younger patients (≤ 40 years) have more aggressive more aggressive tumor biology with more advanced disease stages compared with older patients. However, younger patients often had a superior prognosis in overall survival and quality of life (8). Therefore, it was crucially important to identify key prognostic factors related to the survival time of patients with EOCA and establish an individualized and accurate survival prediction model for EOCA. Tumor survival prediction models are of great guiding significance for patient prognosis assessment, treatment regimens optimization, surgical patient screening, postoperative adjuvant treatment plan determination, identification of high-risk recurrence patients, follow-up frequency formulation and rational use of medical resources. Comparison with traditional TNM staging system, which only considers depth of tumor invasion, lymph node metastasis and distant metastasis, the nomogram prediction model with multiple factors were reported with major benefits (9). The nomogram transforms the complex regression equation into a visualized graph, which makes the results of the prediction model more readable and facilitates the evaluation of patients. It is precisely these inherent strengths that permit the application in medical research and clinical practice of nomograms. A previous study by Zheng et al. has shown that the tumor deposits may be a significant indicators leading to the poor outcome for patients undergoing colon cancer resection surgery (10). Qi et al. have reported that tumor deposits was an independent unfavorable prognostic factor for DFS in N1-stage patients, associated with neural invasion and more common in young adults (11). Moreover, a recent study has indicated that the tumor deposits to be associated with negative prognostic effect, especially in stage IIIB colon cancer, with a 3.2-fold increased risk of disease recurrence (12). Also, female patients with colorectal cancer showed a slight but significantly better OS than men (13). Similarly, a meta-analysis by Yang et al. confirmed this finding when comparing nine studies (14). One possible explanation of better survival prognosis was that sex hormones may have a protective effect against colon cancer in young female patients (15). Additionally, numerous studies have validated the tumor size as a negative prognostic role. Dai et al. found that tumor size showed a considerable prediction value in T1 colon cancer, outperformed any other clinical prognostic factors (16). And, a recent study determined tumor size was positively correlated with T stage and negatively impacted survival (17). The findings from our analysis were in line with these previous reports.
However, we acknowledge that a number of variables, including age and race, did not show significant prognostic value in our study. This is reasonable since there were potentially valuable prognostic factors differences between EOCA patients and general colon cancer (CC) patients. In addition, the prognostic nomograms established in this study may not exhibit distinctly differences as compared to that of elderly CC patients. However, it was equally reasonable that regardless of the presence or absence of the difference between the EOCA nomogram and elderly CC nomogram, the prognostic performance of the nomograms in this study was not degraded.
Our study has the following advantages. First of all, the SEER database collects demographic characteristics, tumor characteristics, and survival data of populations in 17 regions across the United States, covering 28% of the US population, with data accuracy as high as 95% (18). This provides strong data support for the establishment of the nomogram, which is impossible to achieve in the general single-center study. Secondly, unlike previous nomograms built to predict the prognosis of patients with colon cancer, our models were more specifically targeted to assess the prognosis of colon adenocarcinoma patients under the age of 50 years. Finally, the calibration curves of the prognostic nomograms reached good concordance between the actual observation and the predicted probability, indicating that our models had good prediction ability.
Even so, there are some limitations in our study meanwhile. In this preliminary study, we obtained the data of EOCA patients from public transparency database and randomly assigned eligible cases into training or validation cohorts to evaluate the nomogram. Further validation in another independent population-based prospective cohort is still warranted before its routine clinical application. Additionally, some important clinical factors were not available in the SEER database including specific treatment information, smoking or alcohol drinking habits, etc. Moreover, the SEER database does not contain data on molecular markers, so it is difficult to evaluate the influence of these factors. These factors might have a potential impact on the effectiveness of the nomograms.

CONCLUSIONS
The nomograms established in this study could effectively predict 3-and 5-year OS and CSS in EOCA patients, which assist clinicians evaluate prognosis more accurately and optimize treatment strategies for individual young patients.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/).

AUTHOR CONTRIBUTIONS
HJ and YF contributed equally to this study. KG analyzed the data. HJ drafted the manuscript. SR contributed with a critical revision of the manuscript. All authors contributed to the article and approved the submitted version.