A Population-Based Study: How to Identify High-Risk T1–2 Esophageal Cancer Patients?

Background Due to individualized conditions of lymph node metastasis (LNM) and distant metastasis (DM), the following therapeutic strategy and diagnosis of T1–2 esophageal cancer (ESCA) patients are varied. A prediction model for identifying risk factors for LNM, DM, and overall survival (OS) of high-risk T1–2 ESCA patients is of great significance to clinical practice. Methods A total of 1,747 T1–2 ESCA patients screened from the surveillance, epidemiology, and end results (SEER) database were retrospectively analyzed for their clinical data. Univariate and multivariate logistic regression models were established to screen out risk factors for LNM and DM of T1-2 ESCA patients, while those of OS were screened out using the Cox regression analysis. The identified risk factors for LNM, DM, and OS were then subjected to the establishment of three nomograms, respectively. The accuracy of the nomograms was evaluated by depicting the calibration curve, and the predictive value and clinical utility were evaluated by depicting the clinical impact curve (CIC) and decision curve analysis (DCA), respectively. Results The age, race, tumor grade, tumor size, and T-stage were significant factors for predicting LNM of T1–2 ESCA patients (p < 0.05). The age, T-stage, tumor grade, and tumor size were significant factors for predicting DM of T1–2 ESCA patients (p < 0.05). The age, race, sex, histology, primary tumor site, tumor size, N-stage, M-stage, and surgery were significant factors for predicting OS of T1–2 ESCA patients (p < 0.05). The C-indexes of the three nomograms constructed by these factors were 0.737, 0.764, and 0.740, respectively, suggesting that they were clinically effective. Conclusions The newly constructed nomograms can objectively and accurately predict the LNM, DM, and OS of T1–2 ESCA patients, which contribute to the individualized decision making before clinical management.


INTRODUCTION
Esophageal cancer is a common malignant tumor of the digestive tract, with about 572,000 new cases and 508,000 deaths in 2018. Globally, ESCA ranks the 7th and 6th leading causes of cancer morbidity and mortality, respectively (1). According to the NCCN Guidelines for Esophageal and Esophagogastric Junction Cancers (Version 3. 2021), T1-2 ESCA has been defined to invade lamina propria, muscularis mucosae, submucosa, or muscularis propria, but not to invade fibrous membrane (2). For patients with newly diagnosed esophageal space-occupying lesions, their pathological diagnosis is often made by endoscopic biopsy (3). Most of T1-2 ESCA patients do not have LNM and DM at the initial diagnosis, but some of them suffer LNM and/or DM (4)(5)(6). Therapeutic strategies of ESCA are made according to individualized conditions of LNM and DM. For T1aN0M0 patients, only endoscopic mass resection is required, such as endoscopic submucosal dissection (ESD) (7), which is featured by a short length of stay, less complications, and high quality of life (8). However, early-stage ESCA is usually found during endoscopy, in which T-stage can be immediately judged, while LNM and DM cannot be clearly determined (9). LNM may occur after mass resection by endoscopy, and as a result, a following surgery is needed (10). Esophagectomy is recommended for cT1b-T2N0M0 ESCA patients, and neoadjuvant concurrent chemoradiotherapy plus esophagectomy is preferred to cT1b-T2N+M0 patients. The presence of DM significantly influences the clinical decision making, and therefore, LNM and DM should be monitored with the following examinations (2). Lymph node puncture can be performed when cervical LNM is suspected by clinical or ultrasound. Abdominal CT or MRI is performed for abdominal metastasis. Suspected metastases adjacent to the trachea and bronchus can be determined by ultrasonic bronchoscopy. PET-CT can be used to detect DM (2). In clinical practice, some gastroenterologists believed that T1-2 ESCA lesions do not break through the muscle layer, which are urgently resected under endoscopy. However, transferring to thoracic surgery once the endoscopic operation is unable to completely remove the tumor lesions can easily cause adverse consequences by empirical tumor resection and lymph node dissection under the circumstances where preoperative examinations are lacking. An early determination of LNM and DM of T1-2 ESCA patients based on their clinical data is beneficial to make individualized therapeutic strategies, reduce medical cost, and enhance the outcomes. In addition, the prognosis of T1-2 ESCA is largely influenced by LNM and DM. Therefore, predicting LNM and DM benefits the judgment of the prognosis of T1-2 ESCA earlier and more accurately.
The nomogram is an intuitive graphical prediction tool to calculate the risk of a clinical event in a patient (11). Compared with the widely used TNM staging system, the nomogram has better predictive ability for many malignant tumors (12). However, an accurate nomogram to predict LNM, DM, and OS in T1-2 ESCA patients is lacking. In this study, we intend to establish nomograms to predict LNM, DM, and OS of T1-2 ESCA patients by analyzing relevant clinical data in the SEER database.

Data Resources and Subjects
In this study, data of T1-2 ESCA patients were extracted from the SEER database, which is a publicly available database providing authorization information for cancer-related records of about 35% of the US population (13). Therefore, our research did not need ethical approval, with a large amount of data and guaranteed quality. A total of 49,527 T1-2 ESCA patients from 1975 to 2018 were obtained from the database. Exclusion criteria were as follows: (1) lack of clinical data the race, tumor grade, tumor position, and tumor size; (2) lack of survival data like vital status, survive time, and reason of death; (3) T0, T3-4, or unclear TNM staging (TX, NX, or MX); and (4) two or more primary tumors. Given the evidence that patients with DM are considered as advanced stage, lymph node status is not a decisive factor in the treatment (14). Recruited T1-2 ESCA patients were divided into group N (n = 1,290, T1-2M0 ESCA patients for predicting risk factors of LNM) and group M (n = 1,747, T1-2 ESCA patients for predicting risk factors of DM). Inclusion and exclusion criteria are shown in Figure 1.

Variable Declaration
Fifteen clinicopathological variables were obtained from the SEER database, including the year of diagnosis, age, race, sex, tumor grade, histology, primary site, tumor size, T-stage, N-stage, Mstage, vital status, reason of death, surgery (primary site), and survival month. OS was defined as the span from the date of diagnosis to that of death from any cause. Cancer-specific survival (CSS) was defined as the time span from the date of diagnosis to that of death due to ESCA. For demographic variables, the optimal cutoff values for the year of diagnosis, age, and tumor size were assessed by plotting Kaplan-Meier curves using the X-tile software (Yale University, New Haven, Connecticut, USA) (15). Specifically, the year of diagnosis was categorized into 2004-2009, 2010-2012, and 2013. The age of T1-2 ESCA patients was categorized into ≤67, 68-81, and ≥82 years ( Figure 3). The tumor size of ESCA was categorized into 0-21, 22-47, and 48+ mm ( Figure 4). In addition, according to the arrangement of the SEER database and the needs of this study, other data were also classified. The pathological subtype of ESCA was categorized into adenocarcinoma, squamous cell carcinoma, and others according to the International Classification of Disease for Oncology 3 rd Edition, (ICD-O-3) hist/behav, malignant. According to the primary site labeled in SEER, the tumor site of ESCA was categorized into cervical esophagus, thoracic esophagus, abdominal esophagus, and overlapping lesion of esophagus. Since different AJCC versions were used for diagnosis, we carefully compared the 6th, 7th, and 8th, edition AJCC staging, and finally the 8th edition was adopted as follows: T1a/T1b was merged into T1, T4A/T4b (7th and 8th edition AJCC) was merged into T4, and N1-3 were merged into N+.
The above modifications would not affect the accuracy of the research results.

Nomogram Construction
We established univariate and multivariate logistic regression models (16) to screen out risk factors for LNM in group N and DM in group M, respectively. The Cox regression model was introduced to screen out prognostic factors of T1-2 ESCA. The effects of various factors on LNM, DM, and OS of T1-2 ESCA were measured by calculating the odds ratio (OR) and hazard ratio (HR). The subdistribution hazard region (SHR) was used to measure the impact of prognostic variables on CSS. The OS curve was drawn by the Kaplan-Meier method, and the cumulative incidence rate of tumor was plotted by cumulative incidence rate function. Then, two nomograms were created to predict the risk factors of LNM and DM in T1-2 ESCA patients according to the results of logistic regression models. According to the Cox proportional hazard model, a predictive nomogram was established to calculate the OS of T1-2 ESCA patients. These nomograms were validated by ROC and calibration curves for  their accuracy. The C-index was used to reflect the accuracy of the model, in which a maximum of 1.0 indicated the perfect differentiation ability, and greater than 0.7 indicated a high accuracy of the prediction model. DCA, as a tool to evaluate the clinical application value of the nomogram (17), was used to evaluate the net benefit in this study. In addition, we plotted the CIC to reveal the value of the nomogram model more intuitively.

Statistical Analysis
The optimal cutoff values for the age and tumor size in Kaplan-Meier survival curves were assessed by the X-tile software. The baseline of patients between the training group and the test group was tested through the chi-square test. The baseline characteristics of T1-2 ESCA patients were analyzed using SPSS 26.0 and p < 0.05 considered as statistically significant.   Other data analyses were carried out through the corresponding functions of R software (version 4.0.3).

Clinical Features of T1-2 ESCA
After a strict screening, 1,747 patients diagnosed with T1-2 ESCA between 2004 and 2015 were finally included in this study. They were divided into group N (T1-2N0-1M0, n = 1,290) and group M (T1N0-3M0-1, n = 1,747). The ratio of T1-2 ESCA patients with LNM in group N was 33.41% and that of DM in group M was 26.16%. Clinical data of recruited T1-2 ESCA patients are listed in Tables 1, 2.

Risk Factors and Nomogram of LNM
According to univariate and multivariate logistic regression models, LNM was found closely related to the age at diagnosis, race, tumor grade, tumor size, and T-stage, while it was not correlated with sex, primary site, and histology (   Figure 5). In addition, the exact scores of each factor in the nomogram are as shown in Table 5. Ranked by the weight of each influencing factor, the race of T1-2 ESCA patients was on the top place, followed by tumor grade, tumor size, age, and T-stage. The calibration curve revealed that the nomogram had a strong resolution, and the C-index was 0.737 ( Figure 6). In addition, a N-cohort study of DCA and CIC on the LNM nomogram was conducted, showing that our nomogram was favorable to predict LNM in T1-2 ESCA patients in the threshold range of 0-0.35 (Figures 7, 8).

Risk Factors and Nomogram of DM
According to univariate and multivariate logistic regression models, we found that DM was closely related to age, T-stage, tumor grade, and tumor size, while it was not correlated with sex, race, primary site, and histology ( Table 4). The risk of DM in T1-2 ESCA patients with 68-81 years (OR = 0.72, 95% CI = 0.55-0.93, p = 0.013) and ≥82 years (OR = 0.41, 95% CI = 0.26-0.62, p < 0.001) was relatively low. Different from LNM, T2 ESCA patients were less prone to have DM than T1 patients (OR = 0.44, 95% CI = 0.33-0.57, p < 0.001). DM was more likely to affect grade IV (OR = 5.07, 95% CI = 1.75-15.63, p < 0.001), grade III (OR = 6.84, 95% CI = 3.27-16.75, p < 0.001), or grade II ESCA patients (OR = 3.77, 95% CI = 1.79-9.25, p = 0.001). In addition, T1-2 ESCA patients with a tumor size of  A nomogram was established to visually display the risk factors of DM ( Figure 9). In addition, the exact scores of each factor in the nomogram are as shown in Table 5. From the perspective of score weight, tumor size was the most significant factor for influencing DM of T1-2 ESCA patients, followed by tumor grade, age, and T-stage. The calibration curve revealed that the nomogram had a strong resolution with the C-index of 0.764 ( Figure 10). In addition, we conducted DCA and CIC on the DM nomogram (Figures 11, 12), and the results showed that the DM nomogram was effective to predict DM in T1-2 ESCA patients in the threshold range of 0-0.27.

Risk Factors and Nomogram of OS
Based on the multivariate Cox proportional hazards regression model, prognostic factors for OS of T1-2 ESCA patients were identified. To more intuitively display the results of the multivariable Cox proportional risk model, forest plots were depicted in Supplementary Figure 5. The results showed that there were 9 prognostic factors, including age, race, sex, histology, primary site, tumor size, N-stage, M-stage, and surgery, while tumor grade and T-stage were not correlated with OS. The prognosis of patients aged 68-81 years (HR = 1.29, 95% CI = 1.15-1.46, p < 0.001) or ≥82 years (HR = 1.72, 95% CI = 1.44-2.05, p < 0.001) was worse than those aged younger than 67 years. Concerning race, black patients suffered a worse prognosis than did white patients (HR = 1.38, 95% CI =  6. By adding up the scores of each factor, the probability of 3-, 5-, and 10-year OS in T1-2 ESCA patients could be calculated. The C-index was 0.740, and the correction curve showed that the predicted results were consistent with the actual situation ( Supplementary Figures 7, 8, 9).

DISCUSSION
T1-2 ESCA is characterized as the invasion of the lamina propria, muscularis mucosa, submucosa, or muscularis propria, rather than the esophageal fibrous membrane (2). In the present study, about 49% of newly diagnosed T1-2 ESCA patients did not have LNM and DM, and about 33% of them had LNM, but no DM. Moreover, about 26% of T1-2 ESCA patients had DM. Due to the different statues of LNM and DM, the therapeutic strategies and corresponding prognoses of T1-2 ESCA patients were individualized. At present, pathological biopsy is still the gold standard for the diagnosis of LNM and DM in ESCA patients. Although simple examinations like PET-CT can be used to assess LNM and DM in ESCA patients, its application is limited due to high cost, false-negative rate, and false-positive rate (18). Therefore, a non-invasive and effective method to evaluate the presence of LNM and DM in ESCA FIGURE 9 | There are four factors in the nomogram. After taking values for these four factors (the "point" scale above), the total score is calculated, and the corresponding DM rate (the "total point" scale below) is obtained according to the total score.  patients is urgently needed. According to the prediction results of the model, further examination and therapeutic strategies can be selected more reasonably.
In recent years, a growing number of studies have focused on the prediction models of human diseases, although deficiencies and limitations exist. Previous studies established Cox regression models based on logistic regression analysis, but these models have low prediction ability and cannot be used in clinical practice (19,20). As a new form of prediction models, a nomogram can directly visualize the predicted LNM and DM, which provides a reference for further examinations and clinical decision-making. At present, many nomograms can be used to predict the diagnosis and prognosis of cancers, but there are many problems like the sample size (21), low C-index and the prediction accuracy of the model (22), insufficient inclusion and exclusion criteria (23), lack of cutoff values (24)(25)(26), and latest evidence (27). To our knowledge, this is the only published study to establish a nomogram to predict the incidence and survival rate of LNM and DM in T1-2 ESCA patients by analyzing latest cancer data from 1975 to 2018 in the SEER database. The included subjects were divided into group N (T1-2N0-3M0 ESCA patients for predicting LNM) and group M (T1-2N0-3M0-1 ESCA patients for predicting DM). Three nomograms were established and validated to predict LNM, DM, and OS in T1-2 ESCA patients. The LNM nomogram included five factors, namely, age, race, grade, tumor size, and Tstage. The DM nomogram included four factors age, T-stage, grade, and tumor size. The nomogram of survival rate included 9 factors age, race, sex, histology, primary site, tumor size, N-stage, M-stage, and surgery. The C-indexes of LNM nomogram, DM nomogram, and prognostic nomogram were 0.737, 0.764, and 0.740, respectively, indicating their good clinical value.
Previous studies have shown that age, depth of tumor invasion, tumor size, and grade are related to the risk of LNM in ESCA patients (4). Our findings also revealed that T1-2 ESCA patients with an old age had a lower risk of LNM, which may be attributed to low tumor differentiation in young cancer patients prone to escape immune surveillance. This speculation lacks conclusive data and needs further exploration. T2 ESCA patients had a higher risk of LNM than those with T1. In addition, T1-2 ESCA patients with a larger tumor size had a higher risk of LNM than those with a smaller cancer lesion. In the relationship between grade and LNM, the LNM risks of moderately differentiated cancer, poorly differentiated cancer, and undifferentiated cancer were 2.79, 4.06, and 3.25, respectively. The overall results were also consistent with our conventional cognition. A higher degree of differentiation indicated lower malignant level and possibility to metastasize. However, the proportion of undifferentiated LNM was lower than that in poorly differentiated patients. We considered that a small sample size (41 cases) and early-stage ESCA (T1-2) may cause inconsistent findings. Similar results were obtained showing that T1-2 ESCA patients with an old age had a lower risk of DM than did young patients. A previous study has shown that age is an independent predictor of metastatic organs in cancer patients, and young patients are more prone to have a metastasis (28).
Advanced T-stage and large tumor size were both risk factors of DM in T1-2 ESCA patients. In the relationship between grade and DM, the DM risk of moderately differentiated cancer, poorly differentiated cancer, and undifferentiated cancer was 3.77, 6.84, and 5.07, respectively, which was similar to that in the LNM nomogram. Surprisingly, LNM and DM were not correlated with primary site, histology, and sex, which were inconsistent with previous findings (29,30). In the established OS nomogram, there were 9 factors, including age, race, sex, histology, primary site, tumor size, N-stage, M-stage, and surgery, while it was not related with T1/T2 and grade.
In addition, we found that LNM and DM of T1-2 ESCA were associated with tumor-specific and non-tumor-specific death. Since all clinical data were screened out from 1,747 eligible patients with the mean follow-up for 70 months recorded in the public database, the data and statistical results were convincing.
This study had some limitations. First of all, it was a population-based retrospective analysis lacking prospective data for verification. Secondly, the database had insufficient information about high-risk lifestyle factors (e.g., large consumption of alcohol, eat high-temperature food or pickled food), tumor markers, imaging examination, important molecular factors (PD-1/PD-L1 gene status), metastasis sites, etc. They are believed as important factors for predicting LNM, DM, and prognosis of T1-2 ESCA which should be further explored. Thirdly, sarcoma and GIST are also malignant tumors with ICDO/3. However, there are other malignant epithelial tumors, so the prediction model established in this paper is not applicable to "Sarcoma and GIST." Finally, our data were only from the United States population and the sample size was relatively small. In the future, multicenter data with a large sample size and population in different races should be analyzed to validate our conclusions.
Collectively, three nomograms were established based on analysis of independent risk factors for T1-2 ESCA patients from downloaded data in the online database for predicting LNM, DM, and OS. Involved factors in nomograms can be easily obtained from clinical records, suggesting the convenience of applying established nomograms in clinical practice. Combined with other clinical data, the established nomograms are expected to assist physicians to make better diagnosis, individualized treatment, and follow-up management for T1-2 ESCA patients.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
YQ, SW, JL, and ZF designed the study. LT, GX, YW, and JC extracted and analyzed the data. YQ and CL wrote and edited the manuscript. The authors were ranked according to their contributions. YQ and SW contributed equally to this work and should be considered as co-first authors.  (6) and calibration curve (7,8,9). There are 9 factors in Supplementary Figure 6. After taking values for these 9 factors (the "point" scale above), the total score is calculated, and the corresponding survival prediction is obtained according to the total score (the "total point" scale below). 7, 8, and 9 The calibration curves for predicting 1-, 3-, and 5-year OS, respectively, and the C-index is 0.740. The diagonal indicates a coincidence between the actual and predicted OS probabilities. When the solid line is close to the diagonal line, it shows that the probability predicted by the nomogram is very consistent with the actual observed value.