Development and Validation of the Individualized Prognostic Nomograms in Patients With Right- and Left-Sided Colon Cancer

Background The overall survival (OS) of patients diagnosed with colon cancer (CC) varied greatly, so did the patients with the same tumor stage. We aimed to design a nomogram that is capable of predicting OS in resected left-sided colon cancers (LSCC) and right-sided colon cancers (RSCC), and thus to stratify patients into different risk groups, respectively. Methods Records from a retrospective cohort of 577 patients with complete information were used to construct the nomogram. Univariate and multivariate analyses screened risk factors associated with overall survival. The performance of the nomogram was evaluated with concordance index (c-index), calibration plots, and decision curve analyses for discrimination, accuracy, calibration ability, and clinical net benefits, respectively, which was further compared with the American Joint Committee on Cancer (AJCC) 8th tumor-node-metastasis (TNM) classification. Risk stratification based on nomogram scores was performed with recursive partitioning analysis. Results The LSCC nomogram incorporated carbohydrate antigen 12-5 (CA12-5), age and log odds of positive lymph nodes (LODDS), and RSCC nomogram enrolled tumor stroma percentage (TSP), age and LODDS. Compared with the TNM classification, the LSCC and RSCC nomograms both had a statistically higher C-index (0.837, 95% CI: 0.827–0.846 and 0.780, 95% CI 0.773–0.787, respectively) and more clinical net benefits, respectively. Calibration plots revealed no deviations from reference lines. All results were reproducible in the validation cohort. Conclusions An original predictive nomogram was constructed and validated for OS in patients with CC after surgery, which had facilitated physicians to appraise the individual survival of postoperative patients accurately and to identify high-risk patients who were in need of more aggressive treatment and follow-up strategies.


BACKGROUND
Colon cancer (CC) is among the most common malignancies in the gastrointestinal tract, with an estimated annual incidence of 1.09 million cases and 551,268 death cases worldwide (1). Despite more tumor biology characteristics and potential prognostic factors were found, prognosis prediction of primary CC mainly depended on tumor-node-metastasis (TNM) status in the diagnosis (2,3). TNM staging system is a common criterion, as recommended by the American Joint Committee on Cancer (AJCC), to predict the outcomes of CC patients by evaluating tumor size (T), regional lymph-node involvement (N), as well as the presence of distant metastases (M) (4). Due to heterogeneity of CC and its incompetence in assessing the metastatic potential of CC, TNM system is not capable of predicting outcomes of all CC, thus which cause survival paradox (5,6). For example, patients with positive lymph nodes (N+) were classified into stage III, regardless of T stage, while patients with early T stage and N+ obtained better outcomes than patients with high T stages and negative lymph nodes (N−) (6). Namely, relying solely on the TNM stage was not enough to predict prognosis and determine treatment strategy of CC patients, which might have caused under-or overtreatment (7). Thus, there is everincreasing need to identify novel robust prediction tools alongside current TNM stages.
Exactly, to remedy the deficiency of the TNM classification system, accumulating prognostic markers including other clinical, pathological parameters and diverse genes have been explored, verified, and applied in clinical practice (8). Recent evidence suggested tumor stroma percentage (TSP) and log odds of positive lymph nodes (LODDS) were practicable determinants in several solid tumors including colorectal cancer and gastric cancer (9,10). TSP was a straightforward measure that can be assessed by microscopic inspection of hematoxylin and eosin (H&E)-stained tissue sections (9). TSP was defined as the proportion of stroma in the entire tumor tissue, and yielded prognostic information in colorectal cancer in recent studies (9,11). LODDS was recently validated as an independent prognostic factor in colorectal cancer (CRC), which played a decisive role in prognostic assessment regardless of lymph node status and count (10).
Right-sided colon cancers (RSCC) were commonly found in the cecum, ascending colon, hepatic flexure, and/or transverse colon, while left-sided cancers (LSCC) were in the splenic flexure, descending colon, and/or sigmoid colon (12). Studies verified that RSCC and LSCC were differed in embryonic origin, anatomy, physiology, pathological type, and molecular biology. It thus concluded that RSCC and LSCC were recognized as two distinct entities in general (13)(14)(15)(16). Nomogram, a simple statistical prediction tool, which contains multiple variables and achieves a high prediction accuracy in a specific event, has shown a more effective prognosis ability than traditional TNM staging systems in multiple types of cancers (17). However, previous nomograms were based on analysis of cohorts which mixed RSCC and LSCC together (18), lacking specific nomograms that respectively predicted the prognosis of RSCC and LSCC.
In this study, we investigated the clinical significance of TSP and LODDS in RSCC and LSCC, respectively, and validated their prognostic value. In addition, we systemically and comprehensively constructed two novel nomograms for RSCC and LSCC to avail clinicians of a more precisely survival rate and customizable treatment decisions.

Patients and Data Collection
This retrospective analysis was conducted in accordance with the Declaration of Helsinki. The study's protocol was approved by the Clinical Research Ethics Committee of Shanghai General Hospital. Due to the retrospective nature of this study and anonymous use of patients' data, informed consent was not required.
In the present study, a total of 1,079 colon cancer cases diagnosed pathologically were enrolled from Shanghai General Hospital between January 2014 and December 2018. All patients had received laparoscopic colectomy. The flow chart of case inclusion and exclusion is shown in Figure 1. The detailed inclusion criteria were shown as follows: (1) patients who underwent laparoscopic colectomy as initial treatment and did not receive any preoperative treatment; (2) patients had pathology-confirmed CC diagnosis; (3) patients with complete clinicopathological and follow-up data. In addition, patients were excluded if they met the following exclusion criteria: (1) Absence of important clinicopathological factors, such as TSP and LODDS.

Construction and Validation of the Nomogram
Enrolled colon cancer patients from our database were identified and randomly divided into the training set of 462 patients and the internal validation set of 115 patients through a random number list generated by the R function "createFolds" to ensure that outcome events were distributed randomly between the two cohorts. The classification of categorical variables was determined by their clinical significance, which had been divided before the construction of the nomogram. In the training set, twenty characteristics were investigated by Kaplan-Meier curves and log-rank tests, and independent prognostic factors related to OS were identified by univariate and multivariate Cox regression analyses. Meanwhile, the impact of independent prognostic factors on OS were measured by hazard ratio (HR). Based on the significant factors, predictive nomograms for predicting OS were constructed by R software version 2.13.2 (http://www.r-project.org/).
Nomogram validation included discrimination and calibration curves. First, discrimination performance of the proposed nomogram was evaluated by concordance index (Cindex), which value greater than 0.750 was considered to represent the relatively great concordance between the predicted and the observed responses (19). Second, calibration curves were performed by comparing the nomogram predicted OS probability with corresponding actual survival OS probability through the Kaplan-Meier method. In addition, decision curve analysis (DCA) and the receiver operating characteristic (ROC) curve were both applied in this study. DCA was used as a novel method to assess the nomogram's ability in visualizing the clinical outcomes and evaluating the risk of adverse outcomes of individuals (20). ROC curve was used to compare the discriminative power of the proposed nomogram with the 8 th AJCC TNM classification. All analyses were performed using SPSS version 20 (IBM, Armonk, NY, USA) and R version 2.13.2 via the design and survival packages. A P value of <0.05 was considered as significant.

Assessment of the TSP
The deepest point of tumor invasion of the H&E-stained sections of surgical biopsies were used to assess TSP. First, scanning of tumor sections was carried out using the automatic digital slice scanning system (KF-PRO series) at objective magnification ×10, and visualization was performed by the digital slice reading software K-Viewer (Konfoong Biotech, NB, China, 1.5.3.1). Given that there was some heterogeneity in assessment of TSP among biopsy sections, a representative region with the most invasive tumor margin was selected at objective magnification ×4 as previously described (9). Then, a single field of representative region presented with tumor cell in all borders of image were further chosen to assess TSP at objective magnification ×40. Whereas, biopsy sections that contained necrosis or mucin in representative region were excluded for the scoring. Subsequently, a machine algorithm based on MATLAB were used to calculate percentage of stroma of the visible field. Our previous study confirmed that assessment of TSP based on machine algorithm was more accurate than that based on artificial visualization (21). In this study, a TSP ≤50% of tumor area was categorized as low TSP, while a TSP >50% of tumor area was regarded as high TSP.

Definitions of LODDS
LODDS was defined as the loge [(positive nodes + 0.5)/(negative nodes + 0.5)], namely, the log of the ratio between the number of positive lymph nodes and the number of negative lymph nodes (22). X-tile (Yale University, 3.6.1) was also performed to calculate the cutoff value for LODDS group. In terms of the discovery cohort and the validation cohort, LODDS was classified into three categories including ≤ −0.9138, −0.9138 to −0.2373 and > −0.2373.

Statistical Analysis
Demographics and clinical characteristics were summarized using the average and standard deviation for continuous variables while deploying frequency and percentages for categorical variables. Continuous variables with normal distribution were compared using the Student's t test, or the Mann-Whitney U test was used for variables with abnormal distribution. OS curves were generated by using Kaplan-Meier survival analysis, and the differences in survival distributions were performed using the log-rank test. The Cox proportional hazards model was used to determine the hazard ratio (HR) of possible risk factors and OS. Variables were converted to classify variables for univariable analysis, and the factors that showed significant associations with survival in the univariate analyses were subsequently included in the multivariate Cox regression model to identify independent prognostic factors through backward selection. All statistical analyses were performed by SPSS version 24 (IBM, Armonk, NY, USA) and R version 2.13.2 (https://www.r-project.org). Significance was set as P <0.05 in a two-sided test.

Demographics and Clinical Characteristics of CC Patients
Based on inclusion and exclusion criteria, a total of 577 patients with colon cancer were retrospectively collected from the institutional database, including 261 patients with LSCC and 316 patients with RSCC. The demographics and clinicopathological characteristics of the entire training and validation cohorts of LSCC and RSCC were listed in Table 1, respectively.
In the entire group, 56.67% of patients were male, and 87% of patients were ≥60 years at diagnosis. Most patients had an adenocarcinoma histological type and moderately differentiated tumors. It was T3, T4a, and T4b tumors that accounted for 19.06, 67.59, and 0.02% of all cases, respectively. There was no significant difference between the training and validation cohorts in demographic and clinical characteristics.

Univariate and Multivariate Analyses of Risk Factors Associated With Overall Survival
In the training cohort in LSCC, univariate analyses by Kaplan-Meier curves and log-rank tests showed that age, TNM stage, N stage, positive nodes, LODDS, and CA125 were associated with overall survival. Meanwhile, univariate analysis also showed that age, TNM stage, T stage, N stage, positive nodes, LODDS, CA724, CEA, CA199, and TSP were associated with overall survival in the training cohort of RSCC (Table 2).
Multivariate analysis showed that only age, LODDS, and CA125 were independent risk factors for overall survival in patients with LSCC, and age, LODDS, and TSP were independent risk factors for overall survival in patients with RSCC (Tables 3, 4).

Construction and Validation of the Nomogram
Based on the multivariate Cox regression analysis results, age, LODDS, and CA125 were defined as independent prognostic factors in LSCC, and these were integrated to develop the nomogram of LSCC (Figures 2A-C). Similarly, age, LODDS, and TSP were integrated to construct nomogram of RSCC ( Figures 2D-F).
According to the nomogram of LSCC, LODDS had the greatest influence on the prognosis of LSCC, followed by CA125. While in the nomogram of RSCC, LODDS and TSP played crucial roles in the prognosis of RSCC. The total score based on individual scores of those eight parameters and a particular probability of 3-and 5-year OS could be worked out by clinicians.
To confirm that the corresponding nomograms prediction model had higher efficacy in predicting the prognosis of LSCC and RSCC patients than TNM classification, we compared Cindex among training cohort, validation cohort, and whole cohort in LSCC and RSCC, respectively. In LSCC nomogram, the C-indexes in the training and validation groups were 0.837, 0.942, and 0.837 and 0.790, 0.821, and 0.780, respectively, compared with C-indexes of 0.756, 0.768, and 0.747 and 0.631, 0.624, and 0.629, respectively, based on TNM classification (Tables 5 and 6), which showed that the simple-to-use nomogram was expected to be more accurate than TNM stage. In addition, calibration curves for the nomogram showed no deviations from the reference line, which meant a high degree of reliability (23) (Figure 2).

Risk Stratification Based on the Nomogram
The cutoff values were given out by dividing all patients in the training and whole cohorts into two subgroups based on the total score, in which each group represented a distinct prognosis in LSCC nomogram and RSCC nomogram, respectively. The Kaplan-Meier survival curves were subsequently delineated and shown in   DCA is a novel method for evaluating alternative prognostic strategies, which has advantages over area under curve (AUC) (24). DCA curves for the novel nomogram of LSCC and RSCC and TNM classification in the training, validation, and the entire groups are presented in Figure 4, respectively. Compared with TNM classification, DCA of the nomogram had higher net benefits, which indicated that the nomogram had better clinical utility than TNM classification.

Development of Webserver for Easy Access of Nomogram
An online version of our nomogram ( Figure 5) can be accessed at https://colon-cancer-prediction-tool.shinyapps.io/nomogram_for_ colon_cancer/, to assist researchers and clinicians. Predicted survival probability across time can be easily determined by inputting clinical features and reading output figures and tables generated by the webserver.

DISCUSSION
In the present study, we developed and validated personalized nomograms incorporating age, CA125, LODDS, and TSP to predict the OS probability for LSCC patients and RSCC patients after radical resection, respectively. The nomograms had exhibited more competitive capability of discrimination and calibration in both of the training and validation cohorts. From the point of clinical application, DCA analysis revealed it had promising clinical applicability, and C-index analysis demonstrated nomograms had superior prognostication performance compared to the 8 th AJCC TNM classification (0.837 vs. 0.747, and 0.780 vs. 0.629). Thus, the constructed nomograms were able to provide a feasible and customized tool to inform patients about their long-term prognoses and help clinicians to make more individual treatment decisions. Radical resection was considered to be the only curative approach for CC patients (25). Many patients could obtain 5year survival rate range from 60 to 79%, revealing the prognostic heterogeneity associated with this disease (26). In regard to prognostic heterogeneity, recent study demonstrated that colon cancer side should be acknowledged as a criterion for prognosis (14). Thus, we established different nomograms based on colon cancer side to deliver more customized prognosis prediction. Besides, it was found that the more the proportion of tumor stroma had increased, so much the poorer survival in patients with solid tumor including colon cancer (27,28). In spite of nomograms for survival prediction of colon cancer patients were proposed previously (29,30), the nomogram adopting TSP as an independent prognostic factor is firstly brought forward by our group as far as we are concerned. Interestingly, our results showed TSP was higher in RSCC compared to LSCC, and univariate and multivariate analyses further demonstrated TSP acted as an independent prognostic factor in RSCC. Indeed, given that the present threshold of 50% TSP was consistent with previous studies, these results suggested that this simple, rapid assessment of the tumor stroma using machine algorithm might improve prognostic prediction in CC patients (9,10,31). Despite recognition of the importance of the TSP in prognostic prediction, its differences in LSCC and RSCC have yet to be fully investigated. First, from a clinical standpoint, RSCC patients are more likely to exhibit advanced tumor stage and show poor prognosis and overall survival, which is consistent with poor prognosis of higher TSP (14). Second, from a biological viewpoint, the capacity of RSCC to detoxify carcinogens is weaker than that of LSCC, owing to the fact that stroma, a target of carcinogens, modulates the growth and oncogenic potential of adjacent epithelium (32,33).
To date, there are several established nomograms that are capable of predicting the survival after radical resection for CC patients (20,29,(34)(35)(36)(37)(38). Some researches reported that the prognosis of CC was obviously connected with factors    (29). Besides traditional indicators, recent studies incorporated novel prognostic factors including tumor deposits, CpG sites, and autophagy signature genes into nomograms, respectively, which also showed the potential to assess CRC patient prognosis (36)(37)(38).
Simultaneously, almost all of them were focused on CRC patients of all sides (LSCC, RSCC, and rectal cancer). However, multiple studies reported biological and survival differences between right-and left-sided colon cancer, which might release a signal corresponding nomogram for RSCC and LSCC patients was called for separate and further research (12)(13)(14)(15)(16)39). Tumors arising on the right side of the colon, in fact, were seemed to follow different molecular pathways of oncogenesis with LSCC (40). RSCCs were more commonly diploid and characterized by mucinous histology, high microsatellite instability, CpG island methylation, and BRAF mutation. Conversely, LSCCs were found to have frequent p53 and KRAS mutations (13,40). Apart from intrinsic biological differences (i.e., higher rate of BRAF mutant cases) related to a more aggressive clinical behavior, several other factors including surgical technique and sensitivity to chemotherapy should be taken into account to explain the different outcomes in LSCC and RSCC (14,40   location had a critical role in determining CC prognosis, being a surrogate of different and poor biology (14). Shida D et al. conducted a nationwide multicenter retrospective study and found RSCC patients had worse OS comparing to LSCC patients (39). Consistent with previous studies, our research also showed LSCC patients had better survival benefits than RSCC patients. As a result, we developed and validated corresponding nomograms for LSCC and RSCC patients, to achieve better personalized prediction.
Although some nomograms have been developed to predict individual survival probabilities for patients with CRC, most nomograms were focused on general factors including age, sex, usual hematological indexes, and common clinicopathological characteristics (20,34,35). Whereas, there are some unique points including TSP and LODDS in our nomograms. To achieve the best of comprehensiveness and comprehensibility in the nomograms is inevitable and elusive goal for researchers. However, the applicable target of our nomogram is relatively comprehensive and individualized, involving LSCC and RSCC, respectively. Additionally, improved accuracy of nomograms sometimes comes at the cost of increased complexity of the nomogram (41). Our nomograms are concise, with only three predictive factors both in LSCC and RSCC nomograms, yet remain accurate. All the clinical parameters needed for our nomograms are available after surgical resection and routine pathologic examination, without adding any further burden to patients. Though our nomograms demonstrated satisfactory performance in predicting individual survival probability for patients with CC after surgery resection, our study did have several limitations. First, our data were of limited size and derived from data collected at a single institution, and the follow-up missing patients were relatively large, which limited the generalizability and applicable scope of the nomograms. Secondly, our nomograms were mainly based on pathological outcomes; therefore, it is inapplicable to evaluate non-surgical patients. Third, although the model still worked well in our internal cohort, which was intended for relatively strict validation, multi-institutional, prospective validation would provide more convincing evidence.

CONCLUSION
In summary, we have established and validated original predictive nomograms for the survival of patients with LSCC and RSCC after surgery respectively, providing individualized outcome predictions with good accuracy, reliability, availability, and applicability. These convenient nomograms could be helpful to clinicians and patients in the treatment decision-making process.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Clinical Research Ethics Committee of Shanghai General Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.
(18SJKJGG23 and 19SJKJGG22), Three-year Action Plan for Clinical Skills and Clinical Innovation in Shanghai-level Hospitals (SHDC2020CR4022), and 2021 Shanghai "Rising Stars of Medical Talent" Youth Development Program: Outstanding Youth Medical Talents. No funders have any roles in study design, data collection and analysis, decision to publish, or preparation of the manuscript.