Prognostic Nomogram for Liver Metastatic Colon Cancer Based on Histological Type, Tumor Differentiation, and Tumor Deposit: A TRIPOD Compliant Large-Scale Survival Study

Objective A proportional hazard model was applied to develop a large-scale prognostic model and nomogram incorporating clinicopathological characteristics, histological type, tumor differentiation grade, and tumor deposit count to provide clinicians and patients diagnosed with colon cancer liver metastases (CLM) a more comprehensive and practical outcome measure. Methods Using the Transparent Reporting of multivariable prediction models for individual Prognosis or Diagnosis (TRIPOD) guidelines, this study identified 14,697 patients diagnosed with CLM from 1975 to 2017 in the Surveillance, Epidemiology, and End Results (SEER) 21 registry database. Patients were divided into a modeling group (n=9800), an internal validation group (n=4897) using computerized randomization. An independent external validation cohort (n=60) was obtained. Univariable and multivariate Cox analyses were performed to identify prognostic predictors for overall survival (OS). Subsequently, the nomogram was constructed, and the verification was undertaken by receiver operating curves (AUC) and calibration curves. Results Histological type, tumor differentiation grade, and tumor deposit count were independent prognostic predictors for CLM. The nomogram consisted of age, sex, primary site, T category, N category, metastasis of bone, brain or lung, surgery, and chemotherapy. The model achieved excellent prediction power on both internal (mean AUC=0.811) and external validation (mean AUC=0.727), respectively, which were significantly higher than the American Joint Committee on Cancer (AJCC) TNM system. Conclusion This study proposes a prognostic nomogram for predicting 1- and 2-year survival based on histopathological and population-based data of CLM patients developed using TRIPOD guidelines. Compared with the TNM stage, our nomogram has better consistency and calibration for predicting the OS of CLM patients.


INTRODUCTION
Globally, colon cancer is the third most common tumor and the second leading cause of cancer-related deaths (1,2). The 5-year survival rate for colon cancer is 64.6%, while for synchronous metastasis, the patient's survival is only 14.3%. Liver metastasis is the most frequently (17%) observed synchronous metastasis (3). It occurs in over 25% of patients initially and 50% of patients throughout the disease (4).
The prognosis of CLM varies significantly such that personalized prediction of CLM has become the focus of various studies, including those of the American Joint Committee on Cancer (AJCC) TNM system, which has been applied worldwide as the most authorized tool (5). However, the prediction accuracy of TNM staging is not satisfactory enough to predict outcomes (C-index=0.453) (6), which can relate to less predictors and classification on continuous variables (7).
Nomogram has been demonstrated to enhance predictive accuracy. Huang et al. developed two nomograms for the overall survival (OS) of patients with lung metastasis, having C-indexes of 0.754 and 0.749 (8). Furthermore, the C-index of a nomogram predicting the risk of bone metastasis has been reported to be 0.929 (9). Therefore, the present study will construct prognosis nomogram, providing clinicians with a more comprehensive outcome measure.
Recently, the identification of predictors was perceived great importance for prognosis nomogram. The histological type (10,11), tumor differentiation grade (12), and tumor deposit count (13) have been recognized as independent predictors of colon cancer prognosis, as well as liver metastases. Several published studies have reported the prediction prognosis of histopathological predictors, respectively; however, insufficient sample sizes have limited the prognostic capabilities. Nevertheless, there have been a few large-scale CLM nomograms that have incorporated these histopathological indicators. The nomogram by Wu et al. ignored the influence of the histological type and the presence of synchronous metastasis and treatment information, which are important reasons for the nomogram's low consistency (C-index=0.621).
Thus, based on data from the Surveillance, Epidemiology, and End Results (SEER) database, this study aims to develop a largescale model and construct a nomogram that incorporates histological type, tumor differentiation grade, and tumor deposit count. TRIPOD guidelines were used for the development and verification process of the present study.

METHODS
This study design refers to the Prognosis Research Strategy (PROGRESS) (14). It uses the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) to demonstrate its research plan (15). The study followed the TRIPOD checklist (Supplementary Table 1). The flow diagram is displayed in Figure 1.

Participants
The data for this study were extracted from information recorded in the SEER 21 registry (16) from 1975 to 2017. The SEER database is a public cancer information registration database supported by the National Cancer Institute, collecting data from population registries to provide information on survival.

Ethics
Because the SEER data were de-identified, the study did not require either institutional review board approval or the subject's informed consent. All procedures are in compliance with the 1964 Declaration of Helsinki and its subsequent amendments and standards.

Outcome
The outcome of the present study was death overtime, which was already assessed in the database. The definition of the outcome indicators and measurement methods was the same for all patients. Data collection of outcome indicators was performed from population registries of the SEER database, whose proponents did not participate in statistical analysis.

Predictors
In this study, information on the patient's clinicopathological factors was collected and included histopathological indicators, such as tumor differentiation grade, tumor deposit, and histological type; population-based indicators, including age and sex; tumor-related indicators, including tumor site, tumor size, lymph, CEA, and metastasis, as well as treatment-related indicators of chemotherapy and surgery (surgery in primary site). The main elements of the TNM category was also included. Predictors were reliable and straightforward for clinicians. Age, sex, surgical history, and chemotherapy history were obtained by the clinical inquiry; tumor-related indicators such as tumor size, grade, tumor site, synchronous metastasis, etc., were obtained through previous disease assessment, which is approachable for clinicians. The predictors were defined according to the SEER classification (16; https://seer.cancer.gov/).
Prediction models are often adapted to diverse data for an improved prediction (23), and subpopulations with small sample sizes had been merged or removed to increase the accuracy.

Missing Data
Missing data was marked as "unknown" and was estimated for the prognosis model, allowing patients who lack some data to use this model. However, missing entries (more than 80% of item missing or over 15 items were unknown) were excluded from predictive modeling analyses.

Modeling and Verifying Samples
A total of 14,757 patients were involved in this study. Patients were randomly divided into a modeling group (n=9800), internal verifying group (n=4897). An independent external verifying dataset (n=60) were obtained, which was completely independent from the data of model training, with varied demographic background characteristics. Clinical data were collected from medical record reviews in China.

Cox Regression Modeling
Univariate regression analysis was used to determine predictors with independent effects and to initially screen predictors. In addition, multivariate stepwise Cox regression was used to analyze statistically significant variables by hazard ratios (HR) and to predict the association of overall survival and related prognostic factors.

Nomogram Construction
A nomogram was constructed using R software after Cox regression analysis based on the modeling group. All statistical analyses were completed in R software (3rd edition 3.6.2; https:// www.r-project.org).

Risk Groups
This study did not group the predicted probability into risk groups, as the prediction model is a basis for risk judgment.

Model Validation
Model internal verification was conducted using the SEER data sets, and the external validation was performed based on independent cohort. The consistency of the model was displayed using the C-index, the area under the receiver operating curves (AUCs). The model was acceptable when the AUC was over 0.7. The calibration curve was also used for model verification.

Participants
A total of 14,697 patients from SEER database were involved in this study. Patients were randomly divided into a modeling group (n=9800) and a verification group (n=4897). Figure 2 shows the baseline characteristics of all patients stratified by age; the median age of the modeling group and validation group was 65 years. Patients with CLM are more likely to be male (54.38%), regardless of race or region.
Adenocarcinoma seems to be related to liver metastasis; l3,434 (91.41%) patients exhibited adenocarcinoma in the present study, while the remaining 1263 (8.59%) patients were divided into the group without adenocarcinoma for comparative analysis. Patients with a high degree of tumor differentiation grade are known to more likely metastasize; therefore, only 782 (5.32%) CLM patients presented as tumor differentiation grade I. In addition, our results demonstrated that tumor differentiation grade II (65.84%), or moderate differentiation, was present in the majority of CLM patients. Only 13.36% of patients presented positive in tumor deposit, which referred to a solitary tumor nodule that existed in the lymphatic drainage area of the primary tumor. Overall, 11,452 (77.92%) patients died, and the median survival was 14 months. Other population-based data are presented in Table 1.

Cox Regression to Screen Predictors
Univariate regression analysis was carried out to predict the overall survival of CLM patients. The findings showed that only tumor size of 1-5cm, lung metastasis, and tumor deposit counts had a significant association with higher incidence of death ( Table 2). Meanwhile, chemotherapy and surgery were also identified as significant predictors. Notably, the interaction between factors would influence the results of univariate regression. Therefore, we carried out further multivariable regression. Stepwise regression was used for the multivariable analysis, and the results showed that all predictors were found to be significantly associated with OS, except for the T category and tumor site ( Table 3).
Based on these results, the Cox modeling in this study included twelve independent predictors (age, sex, histological type, tumor differentiation grade, tumor deposit, N category, bone metastasis, brain metastasis, lung metastasis, tumor size, surgery, and chemotherapy) of OS. As an essential indicator of the TNM system, the T category was included in the construction of the model. The predictive performance indicates that significant unknown factors remain (coded as 999, unknown or NOS in SEER database) for objective reasons, including data cannot be assessed, therefore, they may be of particular clinical significance (24).

Prognosis Model Development
Taking 0.05 as threshold of P-value, we used the hazard ratio (HR) values and coefficient of the multivariable regression model to represent the parameters of the independent variables for death. We identified a predictor as a risk factor for death when the corresponding coefficient was>0, or the HR value was>1 significantly. Among these clinical features, tumor differentiation grade IV was the most important factor for prognosis (b=0.544, HR=1.720), which indicated that the risk of death in grade IV was 1.720 times that of grade II followed by tumor differentiation grade III (b=0.385, HR=1.470). Conversely, tumor differentiation grade I (b=-0.408, HR=0.665) was considered a protective factor for CLM OS. In different histological types, not exhibiting adenocarcinoma (b=0.160, HR=1.170) resulted in a poorer prognosis, the risk of which was 1.17 times that of adenocarcinoma patients. In addition, the occurrence of tumor deposit counts (b=0.112, HR=1.120) also increased the risk of death by 1.12 times.
As for synchronous metastasis, the prognosis of patients with brain metastasis (b=0.466, HR=1.540) was worse, and the impact on prognosis was more incredible than bone metastasis (b=0.362, HR=1.440) or lung metastasis (b=0.278, HR=1.320). Although the influence of sex is not apparent, it can be concluded from the model that the risk of death for men was 1.09 times that of women. The risk of death for patients who received surgery (b=-0.806, HR=0.447) was only 45% of those who did not, and the chemotherapy seems to be more effective in improving OS (b=-1.086, HR=0.338).

Nomogram Construction
Based on the multivariable regression model results, specific scores of each predictor and total scores were plotted (Figure 3). The score predictors were as follows: male, 2.5; not adenocarcinoma, 5; tumor differentiation grade II, 12.5; tumor differentiation grade III, 22.5; IV, 27.5; tumor deposit, 2.5; bone metastasis, 10; brain metastasis, 12.5; lung metastasis, 7.5; no surgery, 25; and no chemotherapy, 30. Applying this model for clinical prognosis prediction is quite convenient. Consider a female patient aged 70 years old, having a 4-cm-long tumor in

Model Validation
The predicted distributions of the score, 1-year survival, and 2year survival in the modeling and internal validation groups were consistent (Figure 4). A calibration plot of the 1-year OS demonstrated good calibration between prediction and actual survival ( Figure 5). The C-indexes of the modeling group and internal verification group were 0.751 and 0.752 ( Table 4). The ROC curve analysis (Figures 6, 7

DISCUSSION
In this study, we constructed a predictive model for CLM patients using multi-institutional clinical data from the SEER database. The prognosis nomogram for OS was established and its consistency and calibration were verified. The predictors included in the prognostic nomogram were age, sex, tumor site, histological type, tumor differentiation grade, T category, N category, bone metastasis, lung metastasis, brain metastasis, tumor deposit, surgery, and chemotherapy.
Multivariate regression analysis indicated that age was an essential prognostic risk factor. Several models have shown that age is a risk factor (25,26), illustrating that the elderly have a poorer prognosis, especially when over 60 years old (6). Our nomogram demonstrated that tumor differentiation grade IV, undifferentiated, has a poorer prognosis than tumor differentiation grade I, well-differentiated. Consistent with this result, the tumor differentiation grade was observed to be associated with the faster growth rate and more aggressive ability of tumor cells, leading to distant metastasis of CLM (27) and OS. Tumor deposits were found to be a statistically significant factor, which may be considered as a third mixed pathway of tumor migration and invasion in perivascular, perineural, or mesentery (28).
What differed from previous studies was that T0 and T1 were identified as risk factors for CLM, which may be related to the highly malignant characteristics of the tumors in the early stages (9). Adenocarcinomas, such as mucinous adenocarcinoma or signet ring cell carcinoma, are more likely to undergo peritoneal metastasis (29), always leading to poorer prognosis. Still, early mucinous adenocarcinoma may have a better prognosis. However, our results indicate that patients with adenocarcinoma will have a better prognosis, which may be related to the classification of the histological subtypes, but this needs further research to be validated.
Validation of the nomogram resulted in a reliable ability to discriminate events. The AUC has been generally accepted and widely used for model validation. The model achieved excellent prediction power on both internal (mean AUC=0.811) and external validation (mean AUC=0.727), respectively, which were substantially superior to the American Joint Committee on Cancer (AJCC) TNM system. This study will provide clinicians and patients diagnosed with colon cancer liver metastases (CLM) a more comprehensive and practical outcome measure, to help clinicians assess patient prognosis and determine personalized treatment decisions.
This study developed and validated a prognosis model to predict the survival of CLM patients under the guidance of both the PROGRESS framework and the TRIPOD prediction research report for the first time. Previously developed nomograms (30)(31)(32)(33) could be more convincing if they expanded their sample size. We included as much population as possible from the SEER database, covering 35% population of the US. Several predictive models of CLM had been published to predict the survival rate of patients after hepatectomy (34,35), while patients who were suitable for hepatectomy accounted for only 6.1% of patients (36), which led to the limitation of the model in clinical use. Wu Q et al. developed nomograms of CLM from the SEER database but excluded histological type, and synchronous metastasis, without any treatment-related data. This study incorporated these significant predictors, including more available clinical indicators as predictors to improve model applicability.
Internal and external validation of the current nomogram demonstrated high accuracy in predicting OS of CLM patients. However, three limitations must be resolved. The first problem relates to the inevitable lack of other clinicopathological factors of the SEER data set, some vital prognostic factors of liver metastasis, such as the type of liver resection, should be included in future research. Secondly, model focusing on subtypes of adenocarcinoma, or primary surgery type should be performed furtherer, using specific data set. Furthermore, although the external verification indicated an excellent predictive effect, the AUC value of 1-year OS prediction was slightly lower than 2-year, leading requirement to better external validation using large samples of independent multicenter cohorts.

CONCLUSION
In conclusion, based on the TRIPOD prediction research report, we established and validated nomograms to predict the 1-and 2-year survival based on histopathological and population-based data of CLM patients. Compared with the traditional staging  system TNM, our nomogram achieved relatively good discrimination and calibration.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.