A Prognostic Nomogram of Colon Cancer With Liver Metastasis: A Study of the US SEER Database and a Chinese Cohort

Background Among colon cancer patients, liver metastasis is a commonly deadly phenomenon, but there are few prognostic models for these patients. Methods The clinicopathologic data of colon cancer with liver metastasis (CCLM) patients were downloaded from the Surveillance, Epidemiology and End Results (SEER) database. All patients were randomly divided into training and internal validation sets based on the ratio of 7:3. A prognostic nomogram was established with Cox analysis in the training set, which was validated by two independent validation sets. Results A total of 5,700 CCLM patients were included. Age, race, tumor size, tumor site, histological type, grade, AJCC N status, carcinoembryonic antigen (CEA), lung metastasis, bone metastasis, surgery, and chemotherapy were independently associated with the overall survival (OS) of CCLM in the training set, which were used to establish a nomogram. The AUCs of 1-, 2- and 3-year were higher than or equal to 0.700 in the training, internal validation, and external validation sets, indicating the favorable effects of our nomogram. Besides, whether in overall or subgroup analysis, the risk score calculated by this nomogram can divide CCLM patients into high-, middle- and low-risk groups, which suggested that the nomogram can significantly determine patients with different prognosis and is suitable for different patients. Conclusion Higher age, the race of black, larger tumor size, higher grade, histological type of mucinous adenocarcinoma and signet ring cell carcinoma, higher N stage, RCC, lung metastasis, bone metastasis, without surgery, without chemotherapy, and elevated CEA were independently associated with poor prognosis of CCLM patients. A nomogram incorporating the above variables could accurately predict the prognosis of CCLM.


INTRODUCTION
Among all malignant tumors, the incidence and mortality of colon cancer (CC) ranked fourth and fifth worldwide in both genders, respectively (1)(2)(3). In recent years, owing to the development of multiple therapeutic strategies [operation, chemotherapy, neoadjuvant chemoradiotherapy, and radiotherapy (RT)], the prognosis of CC has been improved. For example, Hu et al. (4) found that the duration of adjuvant chemotherapy may be related to improved DFS of CC patients. Besides, the adjuvant RT also benefits the cause-specific survival of CC patients (5). On these bases, the 5year survival rate of T1-T2 stage CC patients was up to 89.9%, while 71.3% in the T3-T4 stage (5). However, nearly 13% of CC patients have been found to have distant metastases at the time of diagnosis, of which the survival rate of five years is only 13.3% (6,7). Among all distant metastases patterns, liver metastases are the most common, accounting for about one-third. In general, colorectal cancer (CRC) is usually studied as a whole cohort. However, CC patients are more likely to metastasize to the liver than rectal cancer (RC), which may be attributed to the different directions of blood metastasis of CC and RC and results in different metastasis patterns. Thus, patients of CC with liver metastasis (CCLM) is a unique subset that deserves further study. Of all CCLM. only 10-25% are eligible for surgery and more than half of them will develop recurrence within three years, so it is obvious that these patients have a worse prognosis than patients without liver metastasis (8)(9)(10). Therefore, it is necessary to explore the prognostic factors to accurately predict the prognosis of CCLM patients for individual planning.
In previous studies, some prognostic factors for CC patients were reported, including stage and metastatic status, which revealed the association between clinicopathologic features and the prognosis of CC patients (11)(12)(13)(14). Nevertheless, there is no large cohort-based study in exploring prognostic factors of CCLM patients. Therefore, in the present study, we intended to identify overall survival (OS)-related variables of CCLM patients and establish a nomogram as a more intuitive tool. Besides, as for different advanced patients, the effect of treatments is significantly different, so we included treatments as prognostic factors to discover the benefit of treatment to patients and avoid over-medication.
Additionally, the prognosis of different CC pathology is different. For example, in patients with stage III CC, proximal colon cancer was found to be worse than the distal (15). Hence, we also conducted the subgroup analyses of the left-sided and right-sided colon cancers and other subgroups to validate the efficacy of our prognostic nomogram. Finally, we also included an external validation to further verify the nomogram, which would provide treatment advice for patients with different risks and help clinical decision-making.

Population Selection
The Surveillance, Epidemiology and End Results (SEER) is a cancer database based on the US population, which collected data on cancer patients from 18 registries and covered more than 30% of the population (16). The data of patients in the present research were downloaded from the SEER * Stat 8.3.6 software. Patients with histological diagnosis as CCLM from 2010-2015 were included. According to the histology and site codes, patients with adenocarcinoma (8,147,8,211,8,221,8,263), mucinous adenocarcinoma (MAC) (8,481), and signet ring cell carcinoma (SRCC) (8,490) and the tumor site of colon (site code: C18.0 and C18.2-18.9) were included. Meanwhile, patients were excluded if: (1) the information of race, histological grade, AJCC T stage, AJCC N stage, accurate tumor size, tumor site, surgery, radiotherapy, chemotherapy, carcinoembryonic antigen (CEA), and metastatic status of liver, lung, bone, and brain is unknown; (2) not the first tumor; (3) survival time < 1 month; (4) age at diagnosis < 18 years old. All included CCLM patients were randomly divided into a training set (70%) and an internal validation set (30%). The training set was used to determine the independent prognostic factors for CCLM patients and establish the prognostic nomogram, while the internal validation sets were used to validate the nomogram.
To further validate our nomogram responsibly, patients diagnosed as CCLM from August 1998 to May 2019 in The First Hospital of China Medical University were used to form the external validation set. This validation set included 101 CCLM patients who were recruited according to inclusion and exclusion criteria the same as the training cohort. The time of the last follow-up was June 2020. This study was approved by the institutional review board of The First Hospital of China Medical University.

Variable Collection
The variables included in the present study were age at diagnosis, race, gender, tumor site, histological type, tumor size, histological grade, AJCC T status, AJCC N status, CEA, metastasis sites (lung, brain, and bone), and information of therapy (surgery, radiotherapy, and chemotherapy). The optimal cut-off values of age and tumor size were determined by the X-tile software (17), and the results showed that the best cut-off values of age were 61 and 76 years old, while the optimal cut-off values of the tumor size were 4.6 and 6.1 cm. In our research, the primary outcome was OS, which was defined as the time interval between the day of diagnosis and death for all causes.

Statistical Analysis
The statistical analysis in our study was performed in SPSS 25.0 or R software (Version 3.6.1). A p value<0.05 (two-sided) was considered statistically significant. Firstly, the univariate Cox analysis was used to determine OS-related variables in the training set. Then, the variables with a p-value <0.05 in the univariate Cox analysis were included in the multivariate Cox analysis to identify the independent prognostic factors of CCLM patients. After that, a nomogram was established by the "rms" package in R software based on those independent prognostic factors. Meanwhile, the time-dependent receiver operating characteristic (ROC) curves at 1-, 2-, and 3-years were plotted, and the corresponding time-dependent area under the curve (AUC) values were used to evaluate the discrimination of the nomogram. Besides, the corresponding calibration curves were established to show the calibration of the nomogram, and decision curve analysis (DCA) was performed to show the clinical benefit of the nomogram. Furthermore, based on the risk score and X-tile software, the optimal cut-off values were determined and all patients were stratified into low-, middle-, and high-risk groups. The Kaplan-Meier survival curve was generated to show the difference in OS between the three groups. During the validation of the nomogram, the total points of each patient in two validation sets were calculated according to the nomogram developed in the training set, then Cox regression in this cohort was performed using the total points as a factor, and finally, the C-index, calibration curve and DCA were derived based on the regression analysis (18).
Furthermore, to confirm that the effectiveness of the nomogram was better than a single factor, the ROC curves of all independent prognostic factors were generated. Subgroup analysis was performed in left-side CC (LCC), right-side CC (RCC), liver-only metastasis, multiple metastases, CEA-elevated, CEA-normal, grade I-II, and grade III-IV. The Kaplan-Meier survival curves for each subgroup were generated.

Clinicopathologic Characteristics
According to the criteria of inclusion and exclusion, a total of 5,700 CCLM patients were included, which were divided into a training set (n=3,992) and an internal validation set (n=1,708). The Chi-square test showed that there was no significant difference between the two sets ( Table 1). The average age of these patients was 62.05 ± 13.18 (range: 21-108) years old, and 54.4% of patients were male. Besides, the CEA was elevated in most patients. In comparison, the pathological type in most CCLM patients is adenocarcinoma, accompanied by deep infiltration (T3-T4), grade II, and surgery-received, and the distribution of which was similar to that of CC patients (19). Notably, we found that most CCLM patients have a relatively higher proportion of lymph node metastasis (N1-N2) (80.8%) compared with CC patients (36.2%) (19).

Identification of Prognostic Factors of CCLM Patients in the Training Set
To identify OS-related variables, sixteen variables were included in the univariate Cox analysis. The result showed that age, tumor size, race, tumor site, histological type, grade, CEA, AJCC T status, AJCC N status, extrahepatic metastasis (lung, brain, and bone), and treatments (surgery and chemotherapy) were identified as OS-related variables ( Table 2). Then, the multivariate Cox analysis was performed and the result indicated that higher age, the race of black, larger tumor size, higher grade, histological type of mucinous adenocarcinoma and signet ring cell carcinoma, higher AJCC N status, RCC, lung metastasis, bone metastasis, without surgery, without chemotherapy, and elevated CEA were independently associated with poor OS of CCLM patients ( Table 2).

Development and Validation of the Prognostic Nomogram
To predict the OS of CCLM, a nomogram was developed based on all independent OS-related factors from the training set ( Figure 1). Meanwhile, the time-dependent ROC curves showed that the AUC values in 1-, 2-, and 3-years were 0.792, 0.769, and 0.763, respectively, which suggested the favorable discrimination of the nomogram (Supplementary Figure S1). Then, the AUC values in 1-, 2-and 3-years were 0.754, 0.747, and 0.751 in the internal validation set and 0.725, 0.738 and 0.700 in the external validation set, respectively. Besides, the calibration curves indicated that the nomogram has a strong calibration.
Furthermore, DCA was performed and the results indicated that the nomogram can serve as an effective tool for clinical practice (Supplementary Figure S1).

Risk Stratification for CCLM Patients
Using our established prognostic nomogram, CCLM patients can be divided into high-, middle-and low-risk groups. As shown in Figure 2A, the results of Kaplan-Meier survival analysis with log-rank test suggested that there existed different survival patterns among patients in the three risk groups. Moreover, patients in both validation sets were also divided into three risk groups with the result of X-tile. We can see that patients of the low-risk group had a better prognosis than patients in the highrisk group (P<0.0001) (Figures 2B, C). The above results indicate that the nomogram can divide CCLM patients into three groups with different prognosis to provide a reference for treatment.

Risk Stratification for Subgroup Analysis
Although the ability of the nomogram has been confirmed in both training and validation sets, it remains unclear in subgroups. Hence, to further verify the stability and performance of the nomogram from different dimensions, we divided patients into different subgroups based on tumor site, CEA, the number of distant metastasis sites, and grade. As shown in Figures 3, 4, no matter in training or validation sets, risk stratification can divide patients with different OS into the subgroups of LCC, RCC, CEAelevated, and CEA-normal, which indicated that the nomogram was effective for the distinction of the prognosis in different CCLM patients subgroups. However, in the multiple metastases subgroup of the external validation set, the survival of patients in the three risk groups was not significantly different (p=0.24), which may be attributed to the relatively small sample size (n=15) ( Figure 3L). For the grade subgroups, because there are few patients (n=6, all of them belong to the high-risk group) in grade III-IV, we only analyzed the survival status of patients in grade I-II ( Figure 4K).

Comparison of Predictive Accuracy
AS shown in Figure 5, the AUC values of every independent prognostic factor were higher than 0.5, including the training set and the two validation sets. By comparing the predictive power between the nomogram and all independent factors, we found that the AUC value of the nomogram was higher than every single factor in 1-, 2-and 3-years, suggesting the effectiveness of the nomogram.

DISCUSSION
CC is a highly invasive cancer that is prone to distant metastases, and the most common distant metastatic pattern is liver metastasis. Thus, we included a range of clinicopathological variables to construct a clinical prognostic nomogram for OS of CCLM patients, which achieved considerable discrimination ability and calibration accuracy when applied to the validation cohorts. According to the nomogram risk stratification model, patients in the training or verification group could be effectively divided into three groups (high-, middle-and low-risk groups) with the significant OS. In addition, we included different treatments in the nomogram to clinicians for more facile individual survival prediction.
Although some predictive models have been established in previous studies, we think our study improves upon the previous work. Compared with the study of Wu et al. (20), improvements in ours are as follows. First, from the perspective of the subject, CC and RC patients with liver-only metastasis were included in the study of Wu et al. Although the liver is the most common metastatic site of CC and RC, different molecular developmental mechanisms and metastatic patterns require different staging methods and treatments between CC and RC (21)(22)(23). Therefore, our study only included CC to provide a more accurate prediction of prognosis for CCLM. Second, the study of Wu et al. focused on CRC patients with liveronly metastasis, but it was discovered that multiple metastases occur in approximately 20% of CRC patients (24). Thus, this part of patients cannot be predicted through the nomogram established by Wu et al., while the nomogram we constructed can be used. More importantly, subgroup analyses of both liver-only patients and multiple metastases patients showed good performance of our  nomogram, which further confirm the improvement of our model. Then, among treatment factors, only surgery was included in the study of Wu et al. Whether it was used as a disease treatment method or as an adjuvant treatment, chemotherapy was considered to be beneficial for CCLM patient's survival (25,26). Thus, the factor of chemotherapy was also included in our study and was identified as a protective factor. Finally, we conducted external validation on the established nomogram, which is important and strong evidence. From the perspective of the patient's condition, older age, the race of black, lung metastasis, and bone metastasis are independent prognostic factors of CCLM patients' prognosis. Elderly patients are often accompanied by dysfunction, malnutrition, and comorbidity, which prompts the physicians to choose a less active treatment or shorten the course of treatment and affect the outcome of treatment (27)(28)(29). Meanwhile, it was reported that the prognosis of liver metastasis alone was different from multiple metastases in the elderly group, but not in the middle-aged group in a previous study (30). And this study also found that CCLM patients with extrahepatic metastasis had shorter survival times than patients with liver-only metastases, including lung metastasis and bone metastasis (30). The results in our study suggested that the metastatic sites of lung and bone are independently associated with the prognosis of CCLM patients, which was consistent with the conclusion of previous studies (31).
From the perspective of the tumor, tumor site, tumor size, histological type, N stage, histological grade, and CEA level were determined as independent prognostic factors of CCLM. Previous studies reported that RCC had lower OS and disease-free survival than LCC (32,33), which may be associated with RCC usually presents with a diagnosis of a more advanced stage (34). And another reason may be that microsatellite instability and mutations of KRAS and BRAF are more common in RCC patients (35). Lymph node metastasis is a common form of metastasis in CC, and high rates are also associated with a high risk of multiple metastatic sites and worse differentiation (36). Through the above indirect effects, the prognosis of patients is poor, which proves that the prognosis is related to the N stage. And the conclusion of the higher N stage, the worse the prognosis was consistent with our study (37). However, in the study of Wang et al, only the N1 stage was independently associated with the prognosis of stage IV CRC. While in our study, both N1 and N2 stages were the prognostic factors of CCLM, which may be contributed to the difference between CC and RC and the difference in metastatic patterns. Based on many studies, CEA was also closely related to the survival of advanced CRC patients with liver metastases (38). This conclusion coincided with the results shown in the present study.
From the perspective of treatments, the traditional treatment for patients with stage I-III CC is surgery combined with adjuvant  chemotherapy. Partial or total colectomy is performed in 84% of patients with stage I and II CC, while 67% in stage III (39). And adjuvant chemotherapy within 8 weeks after surgery significantly improves the prognosis of patients. Besides, a recent study has found that adjuvant radiotherapy may benefit CC patients, implying that radiotherapy may also be a treatment option for CC patients. With the advancement of treatment, surgery has also become the standard treatment option for CCLM patients, which can improve patients' outcomes. In clinical practice, partial colectomy and total/subtotal colectomy are more effective for CCLM patients than those without surgery. Additionally, chemotherapy is also an important treatment approach for CCLM patients to significantly prolong the survival time, such as 5-fluorouracil/leucovorin (5-FU/lv), capecitabine, irinotecan, and oxaliplatin (40). As neoadjuvant therapy, chemotherapy can also promote the likelihood of resectability and treat micro-metastases (41,42). Moreover, as the postoperative adjuvant therapy, the previous study reported that chemotherapy was related to OS and DFS of CRC patients with liver metastasis (43). However, more than 80% of CCLM patients are unresectable, and the prognosis of these patients can also be improved with different chemotherapy regimens (6,44). Thus, as with our results, surgery and chemotherapy can improve the outcomes of CCLM patients.
In the present study, the nomogram could be used to effectively predict the prognosis of CCLM patients. However, some limitations should be stated. Firstly, this is a retrospective study based on a publicly available database, which made it susceptible to the inherent weaknesses of retrospective data collection. Besides, specific information of liver metastases associated with the prognosis of CCLM, such as the large size, more than three liver metastases, and presence of bi-lobar metastases, is a lack in the SEER database. Secondly, most patients in the external invalidation set were of other races (Asian) and have received chemotherapy, which may produce selection bias. Thirdly, the sample size of the external validation set was not very large, So, other validation cohorts with a larger sample size for the predictive nomogram are indispensable.

CONCLUSION
In summary, we found that higher age, the race of black, larger tumor size, higher grade, histological type of mucinous adenocarcinoma and signet ring cell carcinoma, higher N stage, RCC, lung metastasis, bone metastasis, without surgery, without chemotherapy, and elevated CEA were independently associated with poor prognosis of CCLM patients. A nomogram incorporating the above 12 predictors could accurately predict the prognosis of CCLM patients.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/).

ETHICS STATEMENT
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
CL and TW designed the research; CH performed the research and analyzed results; CL and JH edited the manuscript; TW and ZL provided critical comments and revised the manuscript; GZ, JQ, YC, XQ, and YL collected and organized data; GZ added the data, sorted them out and analyzed them in the revised manuscript; KX wrote the revised manuscript; All authors contributed to the article and approved the submitted version.