Analysis of risk factors for lymph node metastasis and prognosis study in patients with early gastric cancer: A SEER data-based study

Background Lymph node status is an important factor in determining the prognosis of patients with early gastric cancer (EGC) and preoperative diagnosis of lymph node metastasis (LNM) has some limitations. This study explored the risk factors and independent prognostic factors of LNM in EGC patients and constructed a clinical prediction model to predict LNM. Methods Clinicopathological data of EGC patients was collected from the public Surveillance, Epidemiology, and End Results (SEER) database. Univariate and multivariate logistic regression was used to identify risk factors for LNM in EGC patients. The performance of the LNM model was evaluated by C-index, calibration curve, receiver operating characteristic (ROC) curve, decision curve analysis (DCA) curve, and clinical impact curve (CIC) based on the results of multivariate regression to develop a nomogram. An independent data set was obtained from China for external validation. The Kaplan-Meier method and Cox regression model were used to identify potential prognostic factors for overall survival (OS) in EGC patients. Results A total of 3993 EGC patients were randomly allocated to a training cohort (n=2797) and a validation cohort (n=1196). An external cohort of 106 patients from the Second Hospital of Lanzhou University was used for external validation. Univariate and multivariate logistic regression showed that age, tumor size, differentiation, and examined lymph nodes count (ELNC) were independent risk factors for LNM. Nomogram for predicting LNM in EGC patients was developed and validated. The predictive model had a good discriminatory performance with a concordance index (C-index) of 0.702 (95% CI: 0.679-0.725). The calibration plots showed that the predicted LNM probabilities were the same as the actual observations in both the internal validation cohort and external validation cohort. The AUC values for the training cohort, internal validation cohort and external validation cohort were 0.702 (95% CI: 0.679-0.725), 0.709 (95% CI: 0.674-0.744) and 0.750(95% CI: 0.607-0.892), respectively, and the DCA curves and CIC showed good clinical applicability. The Cox regression model identified age, sex, race, primary site, size, pathological type, LNM, distant metastasis, and ELNC were prognostic factors for OS in EGC patients, while a year at diagnosis, grade, marital status, radiotherapy, and chemotherapy were not independent prognostic factors. Conclusion In this study, we identified risk factors and independent prognostic factors for the development of LNM in EGC patients, and developed a relatively accurate model to predict the development of LNM in EGC patients.


Introduction
Gastric cancer remains important cancer worldwide and is responsible for over one million new cases in 2020 and an estimated 769,000 deaths, ranking fifth for incidence and fourth for mortality globally (1). In recent years, the diagnosis of early gastric cancer (EGC) has rapidly increased due to improvements in universal screening and endoscopic techniques. According to the staging manual jointly developed by the American Joint Committee on Cancer (AJCC) and the International Union Against Cancer (UICC), EGC is defined as a superficial gastric lesion confined to the mucosa (T1a) and submucosa (T1b), regardless of the lymph node status (2). EGC accounts for more than 50% of total cases in Japan and Korea, whereas in Western countries, it accounts for only about 20% (3).
In general, endoscopic resections, such as endoscopic mucosal resection (EMR) and endoscopic submucosal dissection (ESD), can be performed when the likelihood of lymph node metastasis (LNM) is minimal and the size and location of the lesion allow for whole-block resection (4). Lesions are considered absolute indications for endoscopic therapy if they are presupposed to have a <1% risk of LNM (5). EGC progresses slowly, but approximately 20% of patients with EGC develop LNM (6). LNM is an independent risk factor affecting the prognosis of patients with EGC and determining the extent of lymph node dissection (7). The 2018 edition of the Chinese guidelines for the management of gastric cancer states that gastrectomy combined with lymph node dissection remains the primary treatment for patients with EGC with LNM (8). However, in some cases, the occurrence of LNM in EGC patients cannot be identified, resulting in receiving unreasonable endoscopic treatment. For patients with EGC who develop distant metastases, the Japanese guidelines for gastric cancer recommend systemic therapy (9). However, due to the limited number of cases, the main risk factors and prognostic factors for LNM in EGC patients have not been well studied. Therefore, the prediction of the risk of LNM in EGC and the identification of prognostic factors are important prerequisites and bases to guide the rational clinical selection of treatment modalities and improve survival.
To date, several studies have identified some clinicopathological features of EGC as risk factors for predicting LNM, such as age, tumor size, lymphatic invasion, depth of invasion, grade, and intestinal type associated with LNM (10- 13), and corresponding predictive models, including nomograms and scoring systems, have been developed to provide evidence for clinical decision-making, but there is still no consensus on their applicability to the clinic.
The nomogram is widely used for cancer prognosis, mainly because of its ability to reduce statistical prediction models to single-digit estimates of the probability of an event (death or recurrence) (14). Therefore, a nomogram for preoperative assessment of the risk of LNM in EGC can help clinicians choose appropriate treatment modalities. Cox regression analysis can identify risk factors associated with prognosis and may help patients and physicians in various aspects of decision-making.
In this study, using the SEER database, the clinicopathological characteristics of EGC patients with and without LNM were first compared. Logistic regression analysis was then used to identify risk factors associated with LNM. A Cox proportional hazards model was further employed to identify risk factors associated with the prognosis of EGC patients. The results of this population-based study will help improve the management of patients with EGC.

Ethics approval and consent to participate
The study was a retrospective study based on the SEER database. The authors obtained authorization to exact and analyze the research data stored in the SEER program from the National Cancer Institute, USA (reference number 19369-Nov2021). All procedures followed were in accordance with the Declaration of Helsinki and subsequent versions, and were approved by the Ethics Committee of the Second Hospital of Lanzhou University (approval number: 2022A-623).

Data sources and population selection
Clinical data of EGC patients in the SEER database were collected using SEER*Stat software (version 8.3.9; www.seer.cancer.gov) and using personal ID (account number: 12145-Nov2020). Since the SEER database is public, informed consent is not required, therefore, this study was exempted from review by the ethics committee of our institution (15).
Gastric cancer (C16.0-16.9) patients were identified from the SEER database according to the website recoding classification. And 106 patients from the Second Hospital of Lanzhou University who underwent gastric surgery from January 2015 to December 2017. The inclusion criteria used in this study were as follows: (1) Year of diagnosis: January 2004 to December 2015; (2) Histopathologically confirmed and only one primary tumor was gastric cancer; (3) Age > 18 years old; (4) The postoperative pathological stage was: T1N0-3M0-1. Exclusion criteria: (1) unknown ethnicity; (2) unknown tumor size; (3) unknown degree of differentiation; (4) ELNC was not recorded or unclear; (5) survival time was not recorded or survival time after diagnosis was less than 1 month. The process of patient screening is shown in Figure 1 and Figure 2. Patients from the SEER database were randomized into a training cohort and an internal validation cohort. The training cohort included 2,797 patients, while the internal validation cohort included 1,196 patients and 106 patients (the Second Hospital of Lanzhou University). The primary clinical endpoint was OS.
This study was based on public data from the SEER database without interacting with human subjects or using personal identifying information. This research was therefore exempted from review by the Human Subjects Committee of Institutional Review Board of the Second Hospital of Lanzhou University.

Statistical methods
The patients in the database were randomly divided into the training cohort and validation cohort in a ratio of 7:3. The training cohort is used for model development, and the validation cohort is used for evaluation and validation. The optimal cutoff value of ELNC associated with LNM was calculated using X-tile software. The basic characteristics of the included patients were described by number and percentage (n, %). Each variable's contribution in predicting LNM of EGC in the training cohort was tested by univariate logistic analysis. Variables that were statistically significant were further analyzed by multivariate logistic regression. The odds ratios (OR) with corresponding 95% confidence interval (CI) was calculated. Risk factors which were statistically significant in the multivariate analysis were used to construct a predictive nomogram to predict the LNM. Nomogram performance was evaluated with respect to discrimination and calibration. For discrimination ability, the nomogram was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC). Calibration curves were plotted to verify the accuracy and reliability of the nomogram. Internal and external validations were performed to validate the nomogram. Moreover, decision curve analysis (DCA) was plotted to measure the applicability of the nomogram to clinical practice. After exploring the FIGURE 1 Flow chart of the patient screening process in the Surveillance, Epidemiology, and End Results. 3 Results

Patient baseline characteristics
According to the inclusion and exclusion criteria, a total of 3993 EGC patients were identified, with an overall LNM rate of 20.84% and an overall distant metastasis rate of 1.53%. The R language random number method was used to divide the research subjects according to the ratio of 7:3, including 2797 cases in the training cohort and 1196 cases in the validation cohort. The optimal cutoff value of ELNC associated with LNM was calculated using X-tile software as 11. The demographic and clinicopathological characteristics of the training cohort and the validation cohort are shown in Table 1. The two groups of patients were diagnosed in a year, age, sex, race, pathological type, primary site, tumor size, grade, marital status, ELNC, and LNM. There was no significant difference between radiotherapy and chemotherapy (P>0.05). The overall division of the two groups conformed to simple randomization and was comparable. The diagram of the patient screening process in the Second Hospital of Lanzhou University.

Univariate logistic regression analysis
To identify risk factors for LNM in EGC patients. We performed univariate logistic regression and multivariate logistic regression to adjust for confounders. Univariate logistic regression results (Table 3) showed that age, tumor size, grade, histology, and ELNC were related to LNM.

Multivariate logistic regression analysis
Factors with P<0.1 in univariate logistic regression were included in multivariate logistic regression, and four significant risk factors for LNM were finally included: age at diagnosis, tumor size, grade, and ELNC (

Establish a nomogram of LNM
Based on the results of univariate logistic regression, we established a nomogram plot including age, tumor size, grade, and ELNC for examination to predict the probability of LNM in EGC patients (Figure 3). In this model, tumor size and grade were the biggest predictors of LNM. The resampling method was used for internal validation of the nomogram model, and the ROC curve and C-Index were used to evaluate the accuracy of the model; the calibration curve was used to evaluate the consistency of the predicted value with the actual survival situation; the DCA curve and the CIC were used to evaluate the net benefit of constructing the model.
The training cohort ROC ( Figure 4A) showed great discrimination against the nomogram, with an AUC value of 0.702 and a model C-index of 0.702 (95% CI=0.679-0.25). The calibration curve showed high accuracy ( Figure 5A). In addition, DCA and CIC showed that the nomogram showed a threshold probability of 0.2-0.6 with good gain (Figures 6A and 7A). Also in the internal validation cohort, the AUC value of the ROC curve was 0.709 ( Figure 4B) and the C-index was 0.709 (95% CI=0.674-0.744). The calibration curve showed high accuracy ( Figure 5B). In addition, DCA and CIC showed that the nomogram showed a threshold probability of 0.2-0.6 with good gain (Figures 6B and 7B). The evaluation effect of the external validation cohort is essentially the same as the internal validation cohort (Figures 4C, 5C, 6C, 7C).

Survival analysis of EGC patients
After exploring the risk factors of LNM in EGC patients, we also used the Kaplan-Meier method and Cox regression model to analyze the prognosis of EGC patients. Risk factors related to survival in EGC patients were analyzed by Cox proportional hazards regression model. Both univariate and multivariate results showed that age, gender, race, primary site, tumor size, pathological type, LNM, ELNC, and distant metastasis were significantly associated with tumor OS, while the year of diagnosis, grade, radiotherapy, and chemotherapy were not associated with OS (Table 4).

Discussion
EGC is defined by the Japanese Society for Gastrointestinal Endoscopy as an invasive gastric cancer that does not invade deeper than the submucosa and is not associated with LNM. Currently, only South Korea and Japan have relatively complete gastric cancer prevention and screening systems in the world (17). In recent years, with the gradual popularization and application of endoscopic EMR and ESD surgery, the clinical diagnosis and treatment of patients with EGC have developed from a simple surgical operation to a two-way choice of endoscopic resection or surgical treatment. The advantages of endoscopic resection of EGC are less trauma, quick postoperative recovery, high quality of life, and long-term efficacy comparable to surgery. According to the Japanese gastric cancer treatment guidelines, the absolute indications for endoscopic resection are differentiated adenocarcinoma of cT1a without ulceration or differentiated adenocarcinoma of cT1a with ulceration diameter ≤ 3cm. Endoscopic resection of undifferentiated cT1a carcinomas ≤ 2 cm in diameter without ulcerative manifestations is considered an expanded indication (9). However, even with strict adherence to the indications for endoscopic therapy, at least 1.9 percent of cases recur after resection of the lesion, with intervals ranging from 4 months to more than 10 years (18). One of the important risk factors for recurrence is LNM. However, the main drawback of endoscopic resection is that it cannot achieve perigastric lymph node dissection. For patients with EGC with LNM, surgery is FIGURE 3 Nomogram for predicting the LNM.

FIGURE 4
The ROC curves of the nomogram for predicting LNM in the training cohort (A), internal validation cohort (B) and external validation (C).

FIGURE 5
Calibration plots for the nomogram. Calibration plots for the nomogram in the training cohort (A), the internal validation cohort (B), and the external validation cohort (C). There are several articles reported on studies of lymph node metastasis in EGC, which fall into two main categories, one being studies based on the SEER database with a large amount of data but lacking data from other institutions for external validation (10, 19). The other category is that of single-center-based studies, which included many study variables but had a small overall sample size and lacked external validation (20)(21)(22)(23)(24). The models constructed in the above two types of studies failed to be further validated in terms of clinical generalizability. Our study, with the addition of our center's data as external validation after this revision, confirmed that the model is still applicable in our center's cohort, and the addition of the clinical impact curve as an indicator in the method of assessing the model compared with previous studies allowed for a more comprehensive assessment of the model.
In our study, the LNM rate of EGC was 20.84%, which was consistent with the results reported in the previous study by Wang et. al (12), but higher than the 12.3-15.5% in the previous study and 2.5-8.6% in the Japanese scholar's study (25), which may be due to Japan's early cancer screening policy and high-grade intraepithelial neoplasia defined as EGC in Japanese diagnostic criteria. Using the population-based SEER database, first, multiple clinicopathological factors associated with an increased risk of LNM were identified by univariate logistic regression: age, tumor size, grade, and ELNC. The study by Lin et al. showed that female gender, tumor diameter >20 mm, submucosal invasion, and undifferentiated tumor histology were independent risk factors with an area under the curve of 0.694 (95% CI: 0.659-0.730) (26). In addition, studies have shown that age, Lauren classification, and lymphatic and perineural invasion are closely related to LNM, and T1b is more prone to LNM than T1a (27,28). Yin et al. established a first nomogram to identify EGC patients at high risk for LNM using preoperative indicators, the model incorporated 6 independent predictors including tumor size, gross features, histological differentiation, P53, CA19-9, and lymph node status reported by computed tomography, the model has a Cindex of 0.82 (95% CI: 0.78-0.86), which has a high clinical value (22), but due to the relatively small number of included studies. Less, the convincing power of the results is limited.
The DCA curves of the nomogram for predicting the LNM in the training cohort (A), the internal validation cohort (B), and the external validation cohort (C).

FIGURE 7
The CIC curves of the nomogram for predicting the LNM in the training cohort (A), the internal validation cohort (B), and the external validation cohort (C).  In this study, we found that, as in many previous studies, a nomogram of constructed logistic regression showed that age, tumor size, grade, and ELNC were risk factors for LNM, and tumor location was not associated with LNM, consistent with previous findings (22,29). The AUC value of the model was 0.698 (95%CI: 0.679-0.717), The correction effect of the calibration was satisfactory and the DCA decision curve analysis showed strong clinical practicability. The pathological type in this study is not a risk factor for LNM, which is still controversial in several current studies (25,30), which may be related to the difference in the included study population and the number of cases. Lymphovascular invasion and depth of tumor invasion have also been shown to be risk factors for the development of LNM in EGC, possibly due to the abundance of lymphatic vessels in the lamina propria and submucosa (25,31). Unfortunately, due to the limitations of the SEER database, these two indicators were not included in our study. It is worth mentioning that ELNC was associated with LNM in this study, which may be closely related to the presence of lymphatic micrometastases (32), and examining more NLNCs can improve the detection rate of potentially metastatic-positive lymph nodes (33). Lou et al. explored the significance of lymph node micrometastasis (LNMM) in T1N0 EGC, and LNMM may be a key mechanism of recurrence after surgical treatment in T1N0 EGC patients (34). Therefore, establishing a risk model for predicting LNM can help improve the risk stratification of patients with EGC and improve the accuracy of diagnosis.
In the analysis of the prognosis of EGC patients, the 3-year and 5-year OS of EGC patients were 77.6% and 68.0%, respectively, which is much lower than that of most Japanese studies reporting that the 5-year and 10-year survival rates of EGC patients were both above 90%. In Western studies, 5-year survival rates ranged from 68.0% to 92.0%. The difference in survival may be due to the higher incidence of diffuse histotype in Western countries and less advanced endoscopic procedures, where surgical resection and D2 lymphadenectomy are considered the gold standard of care (18).
Multivariate Cox regression results showed that LNM was a significantly poor prognostic factor (HR: 1.786, 95%CI: 1.512-2.111, P<0.001). In addition, age, gender, race, primary site, tumor size, distant metastasis, and whether surgery and ELNC were independent prognostic factors, while radiotherapy, chemotherapy, and pathological type and grade were not associated with prognosis. ELNC is not only a risk factor for LNM but also an independent prognostic factor. The 8th edition of the AJCC guidelines does not clearly define the minimum number of lymph nodes to be dissected in patients with T1 gastric cancer undergoing surgical treatment, but our study shows that the ELNC is more than 11. Lymph nodes showed a better prognosis than ≤11 (HR: 1.786, 95%CI: 1.512-2.111, P<0.001), so for EGC patients undergoing surgical treatment, we recommend that the total number of lymph nodes to be dissected should be at least 11. Sun et al. fit a b-binomial model of the number of lymph nodes to be examined for different primary tumor stages. The study concluded that examining 11 lymph nodes could reduce the probability of missing positive lymph nodes to <10%, which is required for patients with EGC. At least six lymph nodes were dissected (35). Population-based results showed no effect of chemotherapy on prognosis in EGC, but a previous analysis of patients with pT1 GC showed that curative surgery alone is sufficient for patients with pT1N0 and pT1N1. Xelox showed no survival advantage in pT1N2 patients. If adverse reactions are considered, S-1 is the best choice for pT1N2 patients. Xelox is recommended for pT1N3 patients (36). Minerva Chirurgica et al. evaluated the prognostic significance of preoperative serum albumin values and metastatic lymph node ratios in patients with gastric cancer. The results confirm that with albumin, age, resection type, perineural invasion, and ratio of metastatic lymph nodes, T and TNM stages were significant predictors of cancer-specific survival (CSS) (37). In addition, some studies have reported the prognostic value of CD44 Variant 9, Ki-67, and microsatellite instability in EGC, but these conclusions need to be confirmed by more future studies (38,39). This study has several limitations. First, this is a retrospective analysis and may be subject to data selection bias. Second, our study lacked serum pepsinogen (PG), serum gastrin-17 (gastrin-17, G-17), Hp infection detection, tumor markers, preoperative imaging, and related ESD or EMR treatment data. Finally, the study population is derived from the SEER database, which mainly reflects the data of Western countries. Although validation of the nomogram with an external cohort may help avoid overfitting of the model, the number of cases in the external validation cohort may have been insufficient. The clinical practices of Eastern and Western countries in the treatment of EGC are very different, so the data of the Eastern population will be required for external validation in the future.

Conclusions
In summary, we established a nomogram for predicting LNM in EGC patients through logistic regression, and internal validation showed that the model had a good discriminative ability, accuracy, and clinical applicability. Independent prognostic factors were identified by Cox regression, and the results showed that EGC patients had better prognoses when the number of dissected lymph nodes exceeded 11. It is hoped that our results can help clinicians make individualized clinical decisions for EGC patients and facilitate the process of individualized treatment.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

Ethics statement
Written informed consent was not obtained from the individual (s) for the publication of any potentially identifiable images or data included in this article.

Author contributions
JL, TC, and ZH contributed to the study design and wrote the article. YM and WX completed the data analysis. YY and WX generated and improved the figures and tables. KC and HL proofread the manuscript. XC and WW reviewed the article. All authors contributed to the article and approved the submitted version.