Construction a new nomogram prognostic model for predicting overall survival after radical resection of esophageal squamous cancer

Background Esophageal cancer is one of the deadliest malignancies in the world, and 5-year overall survival (OS) of esophageal cancer ranges from 12% to 20%. Surgical resection remains the principal treatment. The American Joint Commission on Cancer (AJCC) TNM (tumor, node, and metastasis) staging system is a key guideline for prognosis and treatment decisions, but it cannot fully predict outcomes. Therefore, targeting the molecular and biological features of each patient’s tumor, and identifying key prognostic biomarkers as effective survival predictors and therapeutic targets are highly important to clinicians and patients. Methods In this study, three different methods, including Univariate Cox regression, Lasso regression, and Randomforest regression were used to screen the independent factors affecting the prognosis of esophageal squamous cell carcinoma and construct a nomogram prognostic model. The accuracy of the model was verified by comparing with TNM staging system and the reliability of the model was verified by internal cross validation. Results Preoperative neutrophil lymphocyte ratio(preNLR), N-stage, p53 level and tumor diameter were selected to construct the new prognostic model. Patients with higher preNLR level, higher N-stage, lower p53 level and larger tumor diameter had worse OS. The results of C-index, Decision Curve Analysis (DCA), and integrated discrimination improvement (IDI) showed that the new prognostic model has a better prediction than the TNM staging system. Conclusion The accuracy and reliability of the nomogram prognostic model were higher than that of TNM staging system. It can effectively predict individual OS and provide theoretical basis for clinical decision making.


Introduction
Esophageal cancer is one of the deadliest malignancies in the world and has a poor prognosis (1). Its major subtypes are esophageal squamous cell carcinoma and adenocarcinoma (2). Esophageal squamous cell carcinoma accounts for 70% of cases of esophageal cancer globally. Esophageal cancer is the fourth most common cancer in China (3) and approximately 70% of global esophageal cancer cases occur in China (4). Although the incidence of esophageal cancer in China has declined in recent years, the absolute number of patients is still high due to the large population.
At present, radical operation is the most effective strategy for the treatment of early esophageal squamous cell carcinoma and the best choice for long-term survival of esophageal squamous cell carcinoma patients. In patients with advanced tumors, a combination of preoperative and perioperative chemoradiotherapy is often required, but the results are still unsatisfactory. The 5-year survival rate for patients undergoing radical resection of esophageal cancer is only 13% to 18% (5).
At present, the AJCC TNM staging system is mainly used to evaluate the postoperative prognosis of esophageal cancer. However, due to individual differences, some patients with the same TNM stage may have different prognosis even after receiving the same treatment (6).
In addition to TNM staging system, studies have shown that other patient characteristics may also affect the prognosis after radical resection of esophageal cancer (7)(8)(9)(10). Therefore, in this study, a new prognostic model for patients undergoing esophageal cancer radical operation was established based on the characteristics of patients by collecting and analyzing clinical data. The results showed that the accuracy and reliability of the new nomogram prognostic model is better than that of TNM model. It is of great value to predict the overall survival of esophageal squamous cell carcinoma patients after radical operation.

General patient information
The clinical data of 256 patients with esophageal squamous cell carcinoma undergoing radical resection were retrospectively analyzed in the Department of Thoracic surgery of Changhai Hospital Affiliated to Naval Medical University from November 2015 to October 2017.
This study was approved by the Hospital Ethics Review Committee.

Pathological examination results
The pathological staging of esophageal squamous cell carcinoma was conducted according to the 8 th edition American Joint Commission on Cancer (AJCC) staging system. Patients' Tstage, N-stage and P-stage were determined by experienced clinicians and pathologists based on pathological examination results. Immunohistochemistry of LEF1, Ki67 and p53 were detected. Tissues were embedded in paraffin and analyzed by the avidin-biotin complex method. Immunohistochemical results were scored by two experienced pathologists who were unaware of clinical and follow-up information.

Study endpoints and information collection
The endpoint of the study was overall survival (OS) of patients. OS was defined as the time from surgery to death or the last followup. All patients were followed up periodically by telephone, with the last follow-up date being July 2022.
Peripheral blood biochemical information was collected within one week before surgery. Preoperative neutrophil lymphocyte ratio (preNLR) is equal to the absolute number of neutrophils divided by the absolute number of lymphocytes.

Statistical analysis
As there was no unified and verified cut-off value, the cut-off value of continuous variables such as preNLR, p53, Ki-67 and hospital stay were calculated by using the Log-rank test based on Kaplan-Meier curve, which made the data on both sides of the cutoff value have the best difference.
Three variable screening methods were used to screen variables in the new prognostic model (1): Univariate Cox regression: Based on the results of univariate Cox regression, variables with significant differences (P <0.05) were included in the multivariate Cox regression model (2). Lasso regression: Lasso regression used the "glmnet" package to screen the best combination of variables. In order to select the model with excellent performance and the least number of independent variables, we set lambda (l)=lambda.1se (3). Randomforest regression: RandomForest regression filtered variables through "randomForest" package, and set parameters as ntree=200, mtry=8, sampsize=100. Then the max-subtree function was used to screen out variables with high conservatism.
The selected variables were incorporated into the multivariate Cox regression model and a nomogram was constructed using the "rms" package to visualize the results of multivariate Cox regression model. This nomogram can convert the correlation coefficients of the Cox proportional risk model into 0-100 points to calculate the total score. Then the 1-year, 3-year, and 5-year OS rates were obtained according to the total score. ROC curve was used to compare the prognostic models, and AUC was used to evaluate the best model. The nomogram prognostic model was verified internally using "bootstrap" package to calculate the C-index of the model. Finally, the DCA and IDI was used to evaluate the benefits of nomogram model.
The above analyses were implemented using R language (version 4.1.2).
The relationship between preNLR and other clinical characteristics in this study was analyzed by Pearson c2 test. P < 0.05 was considered statistically significant. The above analyses were implemented using SPSS Statistic (version 25).

Patients and tumor baseline characteristics
This study included 256 patients with esophageal squamous cell carcinoma undergoing radical resection. The baseline characteristics were shown in Table 1. We divided patients into a training cohort and a validation cohort by random grouping, including 160 in the training cohort and 96 in the validation cohort. Then we constructed a prognostic model in the training cohort and verified the reliability and accuracy of the model in the validation cohort. According to the Kaplan-Meier curve, the cut-off value of preNLR is 2.01( Figure 1A), the cut-off value of p53 is 20% ( Figure 1B), the cut-off value of Ki-67 is 69% and the cut-off value of hospital stay is 15 days (Supplementary Figure 1) in the training cohort.

Univariate Cox regression
Univariate Cox regression analysis showed that tumor diameter, tumor location, T-stage, N-stage, preNLR level, p53 level, LEF1 level were correlated with the OS of patients undergoing radical operation (P < 0.05). The results were shown in Table 2. Variables with P < 0.05 were included in multivariate Cox regression.

Lasso regression
In Lasso regression, the variable lambda (l) was introduced to find the best prognostic model, and l determined which variables made the model optimal. The advantage of Lasso regression was that it solves the problem of collinearity between variables. When l=lambda.1se (Figure 2A), a model with good performance and minimum number of independent variables was obtained. Therefore, in this study, we chose this value. When l=lambda.1se, two variables (N-stage and P- stage) were included in the prognostic model ( Figure 2B). Therefore, N-stage and P-stage were included in multivariate Cox regression.

Randomforest regression
Randomforest regression is commonly used to evaluate the importance of variables and has good predictive accuracy. In this study, Bootstrap autonomous sampling method was used to randomly select 200 sample sets (samplesize =100) that were put back into the original data set to form 200 decision trees. Combined with the decision results of 200 trees, the importance of variables was comprehensively evaluated. The variables were ranked as shown in Figure 3. Then, max-subtree function was used to screen out the first several variables with high conservatism, including tumor diameter, N-stage, P-stage and T-stage. Therefore, tumor diameter, N-stage, P-stage and T-stage were included in multivariate Cox regression.

Construction of nomogram prognostic models
Seven variables with P < 0.05 were screened by univariate Cox regression, two by Lasso regression, and four by Randomforest regression. The variable combinations screened by three methods were incorporated into the multivariate Cox regression model (by backward method). The final models of the three methods were determined by the minimum Akaike information criterion (AIC) value.
In the multivariate Cox regression model, preNLR, N-stage, p53 and tumor diameter were reserved in variables screened by univariate Cox regression, AIC=469.81.
N-stage and P-stage were reserved in variables screened by Lasso regression, AIC=476.18.
Tumor diameter, N-stage and P-stage were reserved in variables screened by Randomforest regression, AIC=470.9172.
Among them, the model screened by univariate Cox regression had the lowest AIC value.
In order to compare these three models, ROC curve and AUC value were used to evaluate the prognostic models ( Figure 4). The results showed that the AUC value of the prognostic model screened by univariate Cox regression were the highest when predicting the 1-year and 5-year OS probability, and there was little difference between the prognostic model screened by univariate Cox regression and Randomforest regression when predicting the 3-year OS probability. Therefore, the prognostic model screened by univariate Cox regression was finally adopted, and the variable combinations were preNLR, N-stage, p53 and tumor diameter.
In order to better display the results of multivariate Cox regression, this study introduced the nomogram. Nomogram is widely used for cancer prognosis and has the advantage of quantifying the contribution of variables in prognostic models into estimates of event probabilities. It could provide reference for clinical decision making and screening of high-risk patients. Based on the independent prognostic factors screened out above, a nomogram prognostic model of 1-year, 3-year, and 5year postoperative OS rates for esophageal squamous cell carcinoma patients was constructed using R language (version 4.1.2). In our nomogram prognostic model, tumor diameter >3cm, preNLR>2.01, p53<=20% and higher N-stage were independent prognostic factors for esophageal squamous cell carcinoma patients ( Figure 5). The model had important value in predicting postoperative overall survival of patients undergoing radical resection of esophageal squamous cell carcinoma.

Validation of nomogram prognostic models
In order to compare the advantages and disadvantages between the new prognostic model and the traditional TNM prognostic model, the C-index was generated through internal cross validation. The results showed that the C-index of the new prognostic model was 0.785 (95% CI 0.662-0.908), and that of the TNM prognostic model was 0.765 (95%CI 0.636-0.894). The predictive accuracy of the new prognostic model is better than that of TNM model. The calibration curves of 1year, 3-year and 5-year overall survival predicted by the new prognostic model (Figures 6A-C) and the TNM prognostic model ( Figure 6D-F) were shown in Figure 6. The results also showed that the prediction accuracy of the new prognostic model was slightly better than that of the TNM prognostic model.
To evaluate the reliability of nomogram prognostic model for postoperative 1-year, 3-year, and 5-year overall survival, we use Decision Curve Analysis (DCA) at different decision thresholds ( Figures 7A-C). The results showed that the prediction line of the new prognosis model was higher than that of the TNM prognosis  model. Further, we use integrated discrimination improvement (IDI) to evaluate the reliability of nomogram prognostic model. The results showed that compared with the TNM prognosis model, the prediction probability of our new nomogram prognostic model improved by 6.0% (P =0.040) at the first year, 4.2% (P =0.079) at the third year, and 5.4% (P =0.048) at the fifth year ( Figures 8A-C). At last, we verified our new nomogram prognostic model in the validation cohort. The results showed that the C-index of the new prognostic model in the validation cohort was 0.773 (95%CI 0.675-0.871). Further, we use integrated discrimination improvement (IDI) to evaluate the reliability of nomogram prognostic model. The results showed that compared with the TNM prognosis model, the prediction probability of our new nomogram prognostic model improved by 3.7% at the fifth year. Therefore, the new prognostic model showed a better prediction of postoperative death for esophageal squamous cell carcinoma.

Relationship between preNLR and other baseline characteristics
At last, this study compared the clinical characteristics of esophageal cancer patients at different NLR levels ( Table 3). The results showed that compared with preNLR<=2.01 group,    Randomforest regression. Variable importance rank by Randomforest regression. There were no signifi c a n t d i ff e r e n c e s i n o t h e r characteristics (P>0.05).

Discussion
Although up to now, significant progress has been made in the treatment of esophageal squamous cell carcinoma including radical surgery, chemotherapy and radiotherapy, the 5-year overall survival rate is still very low (11)(12)(13). The TNM staging system commonly used in clinical practice can only partially predict the prognosis of patients. It has been reported that patients with the same TNM stage may have different prognosis after receiving the same treatment. Therefore, it is of great significance to construct an individualized and accurate prognostic model for clinical judgment of prognosis and adjuvant treatment.
In this study, the clinical characteristics of patients were comprehensively evaluated by univariate Cox regression, Lasso regression and Randomforest regression. Three different multivariate Cox regression models were constructed to effectively avoid the deviation caused by single screening method. The results showed that the AUC value of the prognostic model constructed by univariate Cox regression was the largest. Therefore, it was selected as our prognostic model. In the newly constructed prognostic model, preNLR, N-stage, p53 and tumor diameter were independent factors affecting the prognosis of esophageal squamous cell carcinoma patients after radical operation.
NLR is one of the main indicators of systemic inflammation. According to reports, elevated NLR is a valuable predictor of many cancers, including pancreatic cancer, gastric cancer, breast cancer and so on (14)(15)(16). The relationship between NLR and prognosis has also been reported in esophageal cancer (17,18). However, there is no unified and verified cut-off value for NLR in esophageal cancer. In this study, the cut-off value of NLR is determined to be 2.01 by using the Log-rank test based on Kaplan-Meier curve. Higher preNLR levels were associated with poorer prognosis (P =0.038). Patients were divided into preNLR>2.01 group and preNLR<=2.01 group, and then the clinical characteristics of the two groups were analyzed. The results showed that preNLR level was positively correlated with tumer diameter (P<0.001) and Tstage (P =0.034), and preNLR>2.01 group had larger tumor diameter and higher T stage. There were no significant differences in other clinical characteristics between the two groups.
Much evidence has proved that N-stage significantly affects the prognosis of esophageal cancer (19,20). Survival rate of patients with low expression of p53 is significantly lower than that of patients with high expression of p53, and the recurrence rate of tumor is significantly higher (21,22). It has also been reported that the maximum diameter of tumor is an independent factor affecting OS of esophageal cancer (23,24). The above results are consistent with the results of our study. In this study, N-stage, p53 level and tumor diameter were all independent factors affecting tumor prognosis.
At present, there are few prognostic models for long-term survival of esophageal squamous cell carcinoma patients, and most studies only include a single factor, such as serum  Nomogram. Nomogram prognostic model based on the best multivariate Cox regression model. inflammatory markers (25), immune genes (26) and nutritional risk index (27). The new nomogram prognostic model designed in this study covered the TNM staging system, inflammatory markers and immunohistochemical information of the tumor, and showed better reliability and accuracy compared with those single-factor prognostic models.
To verify the accuracy and reliability of the new model, we generated the C-index through internal cross validation. The results showed that the C-index of the new model (0.785) was higher than that of TNM staging system (0.765), indicating that the predictive accuracy of the new model was higher than that of TNM staging system. At the same time, the DCA and IDI results also showed that the reliability of the prediction results of the new model was higher than that of the TNM staging system, which showed a good prediction of the disease. In the validation cohort, the results also showed that our new prognostic model had sufficient reliability and accuracy. In conclusion, the new prognostic model constructed in this study has good predictive ability and important guiding significance for the prognosis and treatment of esophageal squamous cell carcinoma patients.
Neoadjuvant chemoradiotherapy is currently used in patients with locally advanced esophageal cancer. This study only involved patients with esophageal cancer who could undergo surgery in the early and middle stages. Therefore, the factor of neoadjuvant chemoradiotherapy has not been included. At the same time, since this study was a retrospective study and the number of patients included was too small, the selection bias of the object could not be avoided. Meanwhile, since this study was a singlecenter retrospective study, we only performed internal validation of the prognostic model. In the future, we will conduct further   external validation of the model or verify our model through prospective clinical trials. And with the advancement of technology and treatment, we will gradually improve our prognostic model.

Conclusion
After the screening of three algorithms, preNLR level, N-stage, p53 level and tumor diameter were identified as independent factors affecting the prognosis of esophageal squamous cell carcinoma patients. In this study, the new nomogram prognostic model was more accurate and reliable than the TNM staging system. It could effectively predict the overall survival and provide theoretical basis for clinical decision-making and treatment.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by Shanghai Changhai Hospital Ethics Review Committee. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions
BS, HC, and CL performed the operation and provided guidance for the idea of the article. MQ and WX analyzed the data and constructed the model. LX and YC collected the data, analyzed the data, and wrote the article. All authors contributed to the article and approved the submitted version.

Funding
This work was supported by the National Natural Science Foundation of Shanghai (20ZR1456200) and Changhai Hospital Youth Cultivation Fund (2021JCQN16).

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.