The Evaluation of a SEER-Based Nomogram in Predicting the Survival of Patients Treated with Neoadjuvant Therapy Followed by Esophagectomy

Background A novel nomogram based on the Surveillance, Epidemiology, and End Results (SEER) database has been developed to predict the survival of patients with esophageal carcinoma who received neoadjuvant therapy followed by surgery. We aimed to evaluate the accuracy and value of the nomogram with an external validation cohort. Methods A total of 2,224 patients in SEER database were divided into the training cohort (n = 1556) and the internal validation cohort (n = 668), while 77 patients in our institute were enrolled in the external validation cohort. A Cox proportional hazards regression model was used to develop a nomogram based on the training cohort, while the C-indexes, the calibration curves, receiver operating characteristics curve (ROC), and Kaplan-Meier survival curve were applied in the internal and external validation cohort. Results Five independent risk factors were identified and integrated into the nomogram (C-index = 0.645, 95%CI 0.627–0.663). The nomogram exhibited good prognostic value in the internal validation cohort (C-index = 0.648 95%CI 0.622–0.674). However, the C-index, calibration plot, receiver operating characteristics curve (ROC) analysis, Kaplan-Meier survival curve of the nomogram in the external validation cohort were not as good as the training and internal validation cohort (C-index = 0.584 95%CI 0.445–0.723). Further analysis demonstrated that the resection margin involvement (R0, R1, or R2 resection) was an independent risk factor for the patients, which was not included in the SEER cohort. Conclusions the nomogram based on the SEER database fails to accurately predict the prognosis of the patients in the external validation cohort, which can be caused by the absence of essential information from the SEER database.


INTRODUCTION
As the eighth most common type of malignant tumor and the sixth leading cause of cancer deaths (1), esophageal carcinoma has caused an estimated 544,076 deaths, with 604,100 new cases worldwide alone in 2020 (2). Despite the continuous efforts to improve the treatment efficacy, many patients are confronted with rapid progression and poor prognosis (3). Many patients presented locally advanced esophageal carcinoma when first diagnosed. Neoadjuvant chemoradiotherapy(nCRT) followed by esophagectomyis recommended as the standard treatment for locally advanced esophageal carcinoma (4), which means T2 to T4a, N0 to N+, and M0 disease, according to the eighth edition of American Joint Committee on Cancer (AJCC) staging system (5).
An accurate and feasible prediction model helps the physicians estimate disease progression and survival of the patients and provide better evidence for clinical practice. Nomogram is a widely used tool for cancer prognosis due to its ability to transfer statistical predictive models into a feasible numerical estimate method (6). Some nomograms have been developed to predict the prognosis of patients with esophageal carcinoma. In 2016, Shapiro and his colleagues developed a nomogram predicting overall survival (OS) exclusively in patients with esophageal carcinoma treated with nCRT and surgery, mainly based on the data derived from the CROSS trial (7). This nomogram contains three factors, including clinical nodal category (cN), pathologic tumor category (ypT), and the number of positive lymph nodes in the resection specimen (ypN). Another study validated the nomogram with 975 patients in three academic centers, demonstrating that the nomogram could accurately predict the OS and progression-free survival (PFS) after nCRT and surgery, with a C-statistic of 0.61 (8).
The log odds of positive lymph nodes (LODDS), defined as the natural logarithm of the ratio of a metastatic lymph node to a non-metastatic lymph node, has been emerging as an essential prognostic factor for cancer prognosis, including colon cancer (9), breast cancer (10), oral squamous cell (11), lung squamous cell carcinoma (12), and so on. For esophageal carcinoma, LODDS also exhibits better discrimination power in risk stratification than N descriptor and positive lymph node ratio (LNR) (13). Ye and his colleagues also developed a nomogram based on the Surveillance, Epidemiology, and End Results (SEER) database, which integrated age, gender, histological grade, T stage, and LODDS as the risk factor, with a C-index of 0.647 (13). However, there was no previous validation by an external cohort, especially in Chinese patients. In this study, we renewed the nomogram by extending the diagnosis year from 2004 to 2016. We validated it using an external cohort of 77 patients with esophageal carcinoma to evaluate the accuracy and value of this nomogram.

Patient Selection
The SEER cohort was selected from the SEER database (http:// seer.cancer.gov/). Eighteen population-based cancers were selected in the SEER database, while the SEER*Stat program (v 8.3.9) was used to extract the information of patients with esophageal carcinoma. The extraction conditions were as follows: "the location of the disease: esophagus" and "diagnosis year: 2004-2016." In the research, we enrolled patients with esophageal carcinoma who received neoadjuvant therapy and esophagectomy between 2004 and 2016. Supplementary Table S1 showed the detailed selection process, Supplementary Table S2 showed the program selection codes, while Figure 1 showed the flowchart of the study design. Following variables were extracted: "Age recode with < 1-year-olds", "Race recode (White, Black, Other)", "Sex", "Year of diagnosis", "Derived AJCC T, 6th ed (2004-2015)", "Derived AJCC M, 6th ed (2004-2015)", "Primary Site -labeled Histologic Type ICD-O-3", "RX Summ-Surg Prim Site (1998+)","CS tumor size (2004-2015)", "CS Tumor Size/Ext Eval (2004-2015)", "Grade (thru 2017)", "Survival months", "Vital status recode (study cut-off used)", "Regional nodes positive (1988+)", "Regional nodes examined (1988+)", "First malignant primary indicator". The exclusion criteria were as follows: (a) patients with metastatic disease; (b) patients whose pathological type were not squamous cell carcinoma or adenocarcinoma of esophagus; (c) patients without esophagectomy performed; (d) patients in whom esophageal carcinoma was the first primary tumor; (f) patients not receiving neoadjuvant therapy; (g) patients without information about the number of retrieved and positive lymph nodes; (h) patients with unknown race, tumor site, tumor size, grade, and T stage.
The external cohort was selected from patients with resectable locally advanced esophageal or gastroesophageal junction carcinoma (cT1-T2N+ or cT3-4aNany) who received neoadjuvant chemoradiotherapy followed by esophagectomy from 2015 to 2020 in Renji Hospital. Clinical and pathological data were retrieved retrospectively from the hospital database, while follow-up information was collected by telephone interview. Exclusion criteria: 1. Patients without follow-up data and other essential clinical data; 2. Patients who died of postoperative complications during hospitalization; 3. Patients with surgical resection not completed or time interval between nCRT and surgery more than four months.

Ethical Statement
The study protocol was approved by the Ethics Committee of Renji Hospital, Shanghai Jiao Tong University School of Medicine (Shanghai, China). Informed consent was obtained from each patient in the external validation cohort, and all personal information was anonymous in the dataset.

Nomogram Development
According to the previous report (14), univariate and multivariate Cox proportional hazards regression models were applied to calculate the hazard ratios (HRs) and corresponding 95% confidence intervals (CIs) of the risk factors for the overall survival of the training cohort. The independent risk factors of the multivariate Cox proportional hazards regression analysis were integrated into the nomogram model. The probability of 1-year, 3-year, and 5year OS rates could be estimated according to the nomogram.

Nomogram Validation
The nomogram's discriminative ability and calibration were validated in the training, internal, and external validation cohorts. We used Harrell's C-statistic, or C-index, as the major indicator of the discriminative ability. The C-index values range from 0.5 to 1, meaning the discriminative ability range from none to full. We also used the time-dependent receiver operating characteristic (ROC) curves and the corresponding areas under curves (AUCs) at 1, 3, and 5 years to estimate the discriminative ability. Calibration plots were used for the calibration of the nomogram. The calibration plot is a diagram presenting the relationship between the predicted probabilities and the observed outcomes. The standard curve is a straight line passing through the origin of the coordinate axis with a slope of 1. The more the prediction line falls on a 45-degree diagonal line, the better the model was calibrated (8).

Statistical Analysis
R software (version 4.0.2) was used to construct the nomogram. A P value of less than 0.05 was considered statistically significant. Categorical variables were presented as proportions. Chi-square tests or Fisher's precision probability test were performed in different evaluations of categorical variables. According to the previous report, the sum score of each patient of the three cohorts was calculated based on the Cox hazards proportional regression model. The "surv_cutpoint" function of the "survminer" of the R packages were used to confirm the cut-off point for the risk stratification, which divided the patients into the low-risk and high-risk groups. A Kaplan-Meier survival curve and the log-rank test evaluated the low-risk and high-risk groups (14). Table 1 compared the training and internal validation cohort's demographical and clinicopathological characteristics. A total of 2,224 patients in the SEER database were enrolled in this study and divided into the training cohort and the internal validation cohort with the ratio of 7:3 bootstrapping method. There was no difference between the training and internal validation cohorts in age, race, sex, tumor site, T stage, N stage, histology, histology grade, LODDS, and tumor size (all P > 0.05). However, there were significant differences between the external validation and SEER cohorts in the demographical and clinicopathological characteristics, as shown in Table 2. A total of 77 patients were enrolled in the external validation cohort, and all patients received neoadjuvant radiation therapy. The dose of the radiation before the surgery was 37.8-41.4 Gy per time (average 20 times). In the external validation cohort, patients were younger with no 75+ years (P = 0.0285). The external validation cohort was all Chinese patients. The tumor site, T stage, N stage, histology type, histology grade, and tumor size all demonstrated a significant difference between the external validation cohort and the SEER cohort (all P < 0.001).

Univariate and Multivariate Analysis in the Training Cohort
We conducted the univariate and multivariate Cox hazards proportional regression analysis to confirm the independent

Nomogram Development
Based on the multivariate model, we built the nomogram to predict the probability of 1-year, 3-year, and 5-year survival of the patients shown in Figure 2. Age, sex, T stage, grade, and LODDS were independent risk factors. Each variable corresponded to different points. The total point reflects the survival probability by drawing straight down from the total points axis to the 1-, 3-, and 5-year survival axes. For example, a 60-year-old male patient with T1 stage and grade III pathology, as well as LODDS lower than −2.8, he would get a total of 64.1 points, corresponding to the less than 1-, 3-, and 5-year OS probability of 9.4%, 32.5%, and 43.2%, respectively. The C-index of the nomogram in the training cohort was 0.645 (95%CI 0.627-0.663).

Nomogram Validation
A calibration plot was performed to validate the concordance of the nomogram. Figure 3 showed the calibration plot between the nomogram predictions and the actual observed outcomes of the 1-, 3-, and 5-year OS in the training, internal, and external validation cohorts. The calibration plot demonstrated favorable consistency in the training and internal validation cohorts. However, when the nomogram was applied in the external validation cohort, the consistency was not as good as in the     Figure S1, which divided the patients into low-risk and high-risk groups. The low-risk and high-risk groups exhibited significantly different OS in training and internal validation cohorts (both P < 0.001). However, there was no statistical significance between OS of the low-risk group and high-risk group in the external validation group (P = 0.3) ( Figure 5).

Cox Regression Analysis in the External Cohort
To explore the potential reasons for the different behavior of the nomogram in the internal and external validation cohort, we conducted a univariate and multivariate Cox hazards proportional regression analysis in the external validation cohort, with more clinical data included, such as postoperative complications and surgical type (R0 resection or not), shown in Supplementary Figure S3 and Table S3. Although the age, sex, T stage, grade, and LODDS were independent risk factors

DISCUSSION
Neoadjuvant therapy followed by surgery is currently the treatment of recommendation from major international societies for locally advanced esophageal carcinoma. It is reported that up to 32% of patients show a complete pathological response (ypCR) after neoadjuvant therapy (15). This study developed a novel nomogram that integrated several essential factors including age, sex, T stage, histology grade, and LODDS, based on a cohort including 2,224 patients in the SEER database. This nomogram showed reasonable discrimination and calibration ability in the training cohort and internal validation cohort. However, when it was applied in an external validation cohort of 77 patients in a Chinese thoracic surgery center, the discrimination and calibration became relatively unfavorable. Plenty of nomograms have been developed for patients with esophageal carcinoma in different ways (16), including the adenosquamous esophageal carcinoma (17), early-onset esophageal carcinoma (18), metastatic esophageal carcinoma (19), and so on. Semenkovich and his colleagues developed and validated a nomogram predicting the likelihood of occult lymph node metastases in surgically resectable esophageal carcinomas, including histology, tumor stage, tumor size, grade, and presence of lymphovascular invasion (20). Compared with Semenkovich's study, the histology and tumor size were not significant risk factors in our study's Cox proportional hazard regression analysis. Another study evaluated the prognosis of esophageal carcinoma patients with stages I-III with a nomogram, which consisted of age, marital status, sex, T_stage, N_stage, grade, and surgery (21). The disparity of included factors could be attributed to a different database, targeted patients, and baseline characteristics. A previous published nomogram established a cohort of 626 patients who underwent nCRT plus surgery, with cN, ypT, and ypN categories included (7). The C-index of the nomogram was moderate at 0·63. Goense and his colleagues used an international multi-institutional cohort of patients to validate this cohort. They found that the discriminative ability of the nomogram for OS was moderate (C-statistic, 0.61) and comparable to that of the initial cohort (C-statistic, 0.63), and the nomogram was also beneficial for the prediction of PFS (Cstatistic, 0.64) (8). This nomogram was very simple and easy to use. However, many critical factors were neglected, including the demographical data and histology grade. Ye et al. compared the discriminatory power and value of N descriptor, LNR, and LODDS in the survival prediction of patients with esophageal carcinoma receiving neoadjuvant therapy (13). They found that LODDS demonstrated a higher discriminatory power and goodness of fit over N descriptor and LNR.
Furthermore, they developed a novel nomogram based on a SEER cohort of 2,239 patients, including sex, age, grade, T stage, and LODDS. However, they never validated it in an external cohort. In this study, we updated the SEER cohort with the 2016 added, and rescreened the patients with more strict inclusion criteria. Finally, we enrolled 2,224 patients in the analysis and divided them into the training and internal validation cohorts. We've reached similar results with Ye's study and built a nomogram integrating age, sex, grade, T stage, and LODDS. Age and sex were commonly used in many nomograms when male patients and older patients had worse prognoses (17). Patients with poorly differentiated or undifferentiated histology grades faced a higher risk of recurrence and metastasis and a worse prognosis (22). T stage and LODDS were correlated with the TNM staging system and affected the prognosis. We've noticed no significance between T1 and T2 stages in the multivariate regression analysis, and Ye's study had similar results, which could be attributed to the sample size. Compared with the traditional N descriptor (TNM staging system), LODDS is a novel and promising ratio-based lymph node (LN) staging system reported in many malignant tumors. Yu and his colleagues proved that LODDS exhibited better predictive performance than the N descriptor, the number of positive lymph nodes (NPLN), and LNR among patients with node-positive lung squamous cell carcinoma after surgery (12). However, Baqar et al. reported that LODDS didn't show advantages over LNR and recommended using LNR given its ease of calculation (9). In this study, we adopted the LODDS as the major predictor of the lymph node indicator, which is significant in both univariate and multivariate regression analysis. The Cox regression model and nomogram showed favorable discrimination and calibration ability in the SEER cohort, with a C-index of 0.645 (95%CI 0.627-0.663) in the training cohort and a C-index of 0.648 (95%CI 0.622-0.674) in the internal validation cohort, which was comparable with the previous study. Nevertheless, the C-index was only 0.584 (95%CI 0.584-0.723) in the external validation cohort. The ROC analysis showed similar results: the 3-year-AUC and 5-year-AUC of the nomogram in the training and internal validation cohort were higher than in the external validation cohort. We used the nomogram to stratify patients' risk in different cohorts and compared their survival curves. The high-risk and low-risk groups showed survival differences without statistical significance in the external validation cohort. The disparity between the external validation and SEER cohorts could be attributed to the following reasons. First, the external validation cohort patients were all Chinese, who were others (American Indian/AK Native, Asian/Pacific Islander) in the SEER cohort. Although the race variable was no significance in the risk factor analysis, the race difference might still affect the prognosis. In the external validation cohort, all patients were Chinese, while the race of Asian was a minority and categorized with other races in the SEER cohort. Second, much important information was missed from the SEER database.
We added the resection margin involvement (R0 or R1, R2 resection) in the multivariate regression analysis and found that the resection margin involvement was an independent risk factor for the prognosis. When added into the model, the Cindex of the nomogram reached 0.692 (95%CI 0.494-0.746), significantly higher than the previous result. Third, the neoadjuvant therapy and the surgery plan varied between the SEER cohort and the patients in our hospital, which might affect the final results. Different surgery types (Ivor Lewis and McKeown esophagectomy) could affect the short-term efficacy and prognosis (23,24). Last, the histology type difference between the SEER cohort and the external validation cohort might be another reason for the inconsistency. Squamous cell carcinoma was the primary histology type in Asians, while the proportions of adenocarcinoma were higher in Western patients (25). Semenkovich et al. developed a nomogram for predicting node-positive disease in esophageal cancer and found that adenocarcinoma was a significant risk factor for node-positive disease compared with squamous (20). On the contrary, Du and his colleagues demonstrated that squamous was a risk factor for cancer-specific survival of patients with esophageal carcinoma after resection. Interestingly, many studies didn't find the significance of histology for the survival of the patients (8,26), including ours. The effect of histology on the survival of patients with esophageal carcinoma requires further investigation.
Several limitations of this study must be noted. First, the sample size of the external validation cohort was not as significant as in other studies due to study limitations. As a result, the result of the Kaplan-Meier curve in the external validation cohort might be a false-negative mistake. Also, the Cox regression analysis for the external validation cohort was not converged, and many essential risk factors didn't show statistical significance. The sample size of 20 times the number of factors in the nomogram is proper for validation. In our case, a sample size of 100 patients would be better, but a sample size of 77 cases is enough to find important risk factors. Even with the limited sample size, we showed that the resection margin involvement (R0, R1, or R2 resection) was an independent risk factor, indicating that the nomogram was not accurate due to the lack of this variable. Second, the C-index of the nomogram was not perfect and applicable, which requires more studies to incorporate novel prognostic variables. Amulti-institutional cohort might be more potent than a single-center study to improve and validate the nomogram. Last, recent years have witnessed substantial progress in targeted therapy and immunotherapy, shifting the landscape of neoadjuvant therapies for locally-advanced esophageal carcinoma. As a result, genetic mutation status and the administration of novel therapies can greatly affect the prognosis, absent from the SEER database and the nomogram.

CONCLUSIONS
For patients treated with neoadjuvant therapy followed by surgery, the SEER cohort-based nomogram in the external validation cohort was not as descriptive and accurate as in the internal validation cohort. The nomogram failed to predict the prognosis in the external validation cohort of Chinese patients and should be applied with caution. Future studies should incorporate more prognostic variables to improve the nomogram's descriptive ability and application value.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and the Harmonized Tripartite Guideline for Good Clinical Practice from the International Conference on Harmonization. No approval by the institutional review board was sought, and no individual patient consent was required, because SEER is a public database and the data are deidentified. The authors confirm that they are accountable for all aspects of the work (if applied, including full data access, integrity of the data and the accuracy of the data analysis) in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.