- 1School of Medicine, Nankai University, Tianjin, China
- 2Department of General Surgery, The First Medical Center, Chinese People’s Liberation Army General Hospital, Beijing, China
Background: Although early gastric cancer (EGC) is generally limited to the mucosal and submucosal layers, lymph node metastasis can still occur, which may worsen the prognosis, particularly when the number of examined lymph nodes (ELNs) is inadequate. This study introduces log odds of positive lymph nodes (LODDS) as a prognostic factor and integrates it with machine learning to improve survival predictions in T1N+ gastric cancer (GC).
Methods: This retrospective study used data from the Surveillance, Epidemiology, and End Results (SEER) Program and an independent validation cohort from the Chinese People’s Liberation Army General Hospital First Medical Center. Predictive factors were selected using LASSO regression and multivariate Cox regression. Cox proportional-hazards (CoxPH), random survival forest (RSF), and XGBoost models were developed to predict overall survival (OS). Model interpretability and feature importance were evaluated using the SHapley Additive exPlanations (SHAP) method.
Results: A total of 419 T1N+ GC patients from the SEER database and 193 from our institution were included in the study. LODDS staging was identified as an independent prognostic factor, demonstrating superior discriminatory power compared to N staging (C-index 0.65 vs. 0.57). Based on the Brier score, area under the ROC curve (AUC), and C-index, the RSF model outperformed both the Cox model and XGBoost model. The RSF model achieved a C-index of 0.79 in the training cohort and 0.80 in the validation cohort, indicating favorable discrimination and calibration, with Brier scores below 0.25.
Conclusions: Integrating LODDS staging into the RSF model, alongside other clinical features, provides a highly accurate tool for survival prediction in T1N+ GC patients.
1 Introduction
Gastric cancer (GC) is an aggressive malignancy associated with poor prognosis and remains one of the leading causes of cancer-related death worldwide (1). With the widespread adoption of upper gastrointestinal endoscopic screening and advancements in diagnostic techniques, an increasing number of patients are diagnosed at earlier stages (2, 3). Early gastric cancer (EGC) is pathologically defined as tumor invasion confined to the mucosal or submucosal layers (pT1a, pT1b). Despite being categorized as early-stage disease, approximately 10–20% of EGC patients present with lymph node metastasis (T1N+) (4–6), significantly worsening their prognosis compared to those without nodal involvement (7). To better reflect the prognostic significance of the positive lymph nodes (PLNs) count, the 8th edition of the American Joint Committee on Cancer (AJCC) TNM staging system reclassified patients with T1N3b disease from stage IIb to stage IIIb, indicating poorer survival outcomes (8). This modification underscores the ongoing debate surrounding the prognostic value of PLNs count. Furthermore, T1N+ GC has received insufficient attention, and the traditional TNM staging method relies solely on the absolute number of PLNs, which may inadequately distinguish between patients with varying prognoses (9). Thus, exploring more precise and effective lymph nodes evaluation methods is critical for achieving accurate prognostic stratification of T1N+ GC patients.
The log odds of positive lymph nodes (LODDS) represent a novel lymph node staging metric, which is defined as the natural logarithm of the ratio between the probability of being a positive lymph node and the probability of being a negative lymph node when one lymph node is retrieved (10). This novel prognostic indicator integrates both the number of PLNs and ELNs, providing a more comprehensive and accurate characterization of lymph node involvement in GC patients (11). Recent evidence has demonstrated that LODDS staging exhibit superior predictive ability compared to pathological N (pN) staging, particularly for patients undergoing extensive lymphadenectomy (12–15). Therefore, we incorporated LODDS staging into our prognostic modeling to enhance the accuracy and clinical applicability of predictions.
Currently, due to limited sample sizes, studies assessing prognostic outcomes for patients with T1N+ GC remain insufficient. Machine learning approaches have demonstrated the potential to enhance the accuracy of prognostic models by effectively managing complex clinical datasets. Therefore, we developed a machine learning-based prognostic model for patients with T1N+ GC incorporating LODDS staging, aiming to provide valuable insights for long-term survival prediction and precise risk stratification.
2 Material and methods
2.1 Patient population
Data for GC patients were extracted from the Surveillance, Epidemiology, and End Results (SEER) database (SEER 17 registries, November 2023 submission, covering the years 2000–2021) of the National Cancer Institute using SEER*Stat software (version 8.4.3) (16). As the SEER database strictly maintains patient confidentiality, informed consent was waived for this retrospective analysis. Patients diagnosed with T1N+ GC between 2010 and 2021 were selected for inclusion, encompassing both those who received upfront surgery with pathological staging (pT1N+) and those who received neoadjuvant chemotherapy (NAC) followed by pathological staging (ypT1N+). Patients diagnosed with neuroendocrine carcinoma (ICD-O-3: 8246/3), stromal sarcoma (8935/3), or gastrointestinal stromal tumors (GIST) (8936/3) were excluded. Additionally, patients with missing data on tumor diameter or those who received radiotherapy or had inconsistent radiotherapy information were also excluded. After applying these inclusion and exclusion criteria, 419 eligible patients remained and were included in the final analysis (Figure 1).
Furthermore, data from Chinese People’s Liberation Army (PLA) General Hospital First Medical Center comprising T1N+ GC patients diagnosed between 2012 March and 2024 March were collected as a validation dataset. Employing identical inclusion and exclusion criteria, a total of 193 patients were ultimately enrolled. This study has been approved by the ethic committee of the Chinese PLA general hospital (No: S2025-234-01).
2.2 Study variables
Information on age at diagnosis, sex, tumor diameter, primary site, tumor differentiation, histologic type, N stage, examined lymph nodes (ELNs), positive lymph nodes (PLNs) and treatment of GC patients were collected from SEER database. Primary site was divided into proximal (ICD-O-3 code C16.0 and C16.1), middle (ICD-O-3 code C16.2), distal (ICD-O-3 code C16.4 and C16.5) and others. Histologic type was classified based on the presence or absence of signet-ring cell carcinoma (ICD-O-3 histologic code 8940). The formula of LODDS was: . To avoid undefined or infinite values when the number of PLNs or ELNs was zero, we added 0.5 to both the numerator and the denominator (10). The X‐tile software (3.6.1) and R package “survminer” were used to determine the optimal cutoff values for the three continuous variables of age, tumor diameter, and LODDS staging according to overall survival (OS). Tumor diameter was classified as<10 and >10 mm; LODDS stage was classified as LODDS1 (<−0.9), LODDS2 (−0.9 to −0.4), and LODDS3 (>= −0.4).
2.3 Follow-up
The primary endpoint was overall survival (OS), defined as the interval from the date of diagnosis to the date of death, the date last known to be alive, or the study cut-off date. Patients from PLA General Hospital follow-up data were collected by trained assistants after patient discharge and reviewed by a senior attending physician prior to analysis. Patients received postoperative follow-ups every 3–6 months for the first two years, every 6–12 months from the third through the fifth year, and annually thereafter, according to the 2023 Chinese Society of Clinical Oncology (CSCO) guidelines for gastric cancer management (17). The median follow-up time for the SEER dataset was 80 months (95% confidence interval (CI): 66–111 months) and external validation dataset was 71 months (95% CI: 65–80 months).
2.4 Model development and evaluation
Multivariate Cox regression analyses and the least absolute shrinkage and selection operator (LASSO) algorithm were conducted using clinical characteristics derived from the SEER database. The LASSO regression was conducted using the R package “glmnet”, which applies L1 regularization to shrink regression coefficients and select relevant variables while avoiding overfitting. Variables identified as statistically significant in Cox analyses or considered clinically relevant, including age at diagnosis, primary site, LODDS staging, N staging, treatment modality, and tumor diameter, were integrated into machine learning models to predict 1-, 3-, and 5-year OS for T1N+ gastric cancer (GC) patients.
Machine learning models were constructed using the mlr3proba framework in R (18), which provides a standardized environment for model training, hyperparameter tuning, and performance evaluation under unified resampling strategies. Two machine learning algorithms: Random Survival Forest (RSF) and eXtreme Gradient Boosting (XGBoost) were implemented to predict OS. Model hyperparameters were optimized through a grid search combined with 10-fold cross-validation on the training cohort to maximize the mean concordance index (C-index). Following grid-search optimization, the RSF model’s optimal configuration consisted of 500 estimators, with both the minimum samples required to split an internal node and the minimum samples per terminal node set to 10. For the XGBoost model, the optimal parameters were 300 boosting rounds, a learning rate of 0.05, a maximum tree depth of 4, a minimum child weight of 2 and a subsample ratio of 0.8. All models were trained with fixed random seeds to ensure reproducibility.
To evaluate predictive performance, receiver operating characteristic (ROC) curves and the corresponding area under the curve (AUC) values were compared among the CoxPH, RSF and XGBoost models. Additionally, decision curve analysis (DCA) and Brier scores were employed to assess the clinical utility, precision, and accuracy of these predictive models. To further validate the RSF prognostic model, data from an independent cohort of 193 patients diagnosed with T1N+ GC at our institution were collected and analyzed.
Interpretability of the prognostic model was crucial for facilitating clinical decision-making, enabling physicians to transparently comprehend the factors influencing postoperative outcomes. Furthermore, the SHapley Additive exPlanations (SHAP) approach, a game-theoretic method, was utilized to illustrate the contribution of individual variables to model predictions, enhancing clinical interpretability (19). SHAP values were computed and plotted using the R package “shapviz”.
2.5 Statistical analysis
Differences in demographic and clinical characteristics between the training and validation cohorts were evaluated using the “tableone” R package. The Wilcoxon test for continuous variables, and either the χ² test or Fisher’s exact test for categorical variables. To investigate the associations between clinical-pathological factors and OS among T1N+ GC patients, univariate Cox analyses were performed using the “survival” R package. Variables with p< 0.05 in the univariate analyses were subsequently included in multivariate Cox regression analyses to further evaluate mortality risk and identify independent prognostic factors.
To examine the prognostic impact of NAC in patients with T1N+ GC, patients receiving NAC were matched with those who did not receive NAC using a 1:1 propensity score matching (PSM) approach implemented in the “MatchIt” R package (20, 21). Propensity scores were estimated using a multivariable logistic regression model that included relevant covariates from the SEER database. Two-tailed p-values of less than 0.05 were considered statistically significant. All statistical analyses were performed using R software (Version 4.2.3, Vienna, Austria).
3 Results
3.1 The characteristics of patients
Overall, 419 patients with T1N+ GC diagnosed between 2010 and 2021 were identified from the SEER database and used as the training dataset. Additionally, 193 patients from Chinese PLA General Hospital First Medical Center were included as a validation dataset. Table 1 summarizes the demographic and clinical characteristics of these two patient cohorts. Patients in the validation dataset had a lower mean age (58.3 vs. 67.9 years, p<0.001) and a higher proportion of patients with N2 stage (45.1%), whereas patients in the training cohort predominantly presented with stage N1 (70.9%). Furthermore, the proportion of patients who received NAC was lower in the validation dataset compared to the training dataset (5.2% vs. 27.0%, p<0.001).
Table 1. Demographic and clinicopathological characteristics of training dataset and validation dataset.
3.2 Comparison between LODDS and N staging for prognosis
In terms of the N staging system, a significant difference in prognosis was observed between patients classified as N1 compared to those classified as N2 and N3 combined (p<0.001, hazard ratio (HR)=1.80, 95% CI: 1.31–2.78). However, there was no statistically significant prognostic difference between N2 and N3 stages (p=0.6, HR = 1.16, 95% CI: 0.66–2.05). Conversely, the LODDS staging system demonstrated superior prognostic discrimination among patient groups (Figure 2). Furthermore, LODDS staging demonstrated superior predictive accuracy compared to N staging, as indicated by higher concordance indices, with C-indices of 0.65 versus 0.57 in the training dataset and 0.67 versus 0.60 in the validation dataset, respectively. ROC analyses for predicting 1-, 3-, and 5-year OS further confirmed these findings (Supplementary Figure S1).
Figure 2. Kaplan–Meier survival analysis comparing N staging and log odds of positive lymph nodes (LODDS) staging in training and validation datasets. (A, B) N staging and LODDS staging for overall survival (OS) in the training dataset. (C, D) N staging and LODDS staging for overall survival (OS) in the validation dataset.
Multivariate Cox regression analyses revealed that LODDS staging was significantly associated with OS (p<0.05), while N3 stage was not identified as an independent prognostic factor (p>0.05). Additionally, older age, and tumor diameter emerged as independent prognostic factors significantly influencing OS (Table 2).
3.3 Features selection
The LASSO algorithm and Cox regression analyses were employed to select the variables in the study. Two variables, the tumor differentiation and signet-ring carcinoma were excluded, and the remaining 6 variables were included in the construction of the RSF model. These remaining variables included age, tumor diameter, primary site, N staging and LODDS staging. The procedures for selecting variables are shown in Figure 3.
Figure 3. Results of the least absolute shrinkage and selection operator (LASSO) regression analysis for the prediction models: (A) coefficient profiles plotted against log(λ); (B) ten-fold cross-validation demonstrating the optimal λ value.
After variables selecting, to ensure that there was no collinearity among the variables, we utilized Spearman correlation analysis. The correlation analysis revealed no strong correlations among the included variables (|r|< 0.5 for all pairs, and most p > 0.05). This indicates the absence of significant multicollinearity, suggesting that these variables can be simultaneously included in subsequent model construction (Supplementary Figure S2).
3.4 Model development and performance comparison
In this study, the developed prognostic models were validated using a dataset to assess the predictive performance. Among these models, the RSF algorithm demonstrated superior predictive accuracy compared to the CoxPH and XGBoost models. Specifically, the RSF model achieved the highest C-index of 0.785 and demonstrated the highest AUC for predicting 1-, 3-, and 5-year OS in both the training and validation datasets (Table 3; Figures 4A, B). All evaluated models exhibited Brier scores below 0.25, indicating robust calibration, with the RSF model notably displaying the lowest scores. Calibration curves further validated the excellent predictive accuracy of the RSF model (Figures 4C, D). Additionally, DCA showed substantial clinical net benefits of the RSF model for survival prediction at 1-, 3-, and 5-year OS (Figure 5).
Figure 4. Evaluation of the Random Survival Forest (RSF) model in the training and validation datasets. (A, B) Time‐dependent AUC and receiver operating characteristic (ROC) curves in the training and validation datasets. (C, D) Calibration curves of RSF model in the training and validation dataset.
Figure 5. The decision curve analysis (DCA) of the Random Survival Forest (RSF) model. (A–C) The 1-, 3-, 5-year DCA of the RSF model on training dataset. (D–F) The 1-, 3-, 5-year DCA of the RSF model on validation dataset.
3.5 Model explanation
The RSF model demonstrated optimal performance on both the training and validation datasets. To further clarify the contribution of each predictor variable and enhance the interpretability of the RSF model, we assessed feature importance using SHAP. The SHAP summary plots display the predictors ranked in descending order according to their average SHAP values, reflecting their relative contributions to the model predictions (Figure 6). Among these variables, LODDS staging emerged as the most influential factor (0.0683), substantially surpassing the importance of N staging (0.0167).
Figure 6. Global model explanation by the SHapley Additive exPlanations (SHAP) method. (A) SHAP summary dot plot. (B) SHAP summary bar plot.
3.6 Impact of NAC on ypT1N+ GC prognosis
Additionally, we compared the prognosis of ypT1N+ GC patients who underwent perioperative chemotherapy with pT1N+ patients who did not receive NAC using data from the SEER database. Considering the imbalance in baseline characteristics between these two groups, we conducted a 1:1 propensity score matching (PSM) analysis to ensure comparability (Supplementary Table S1). After matching, our results indicated no significant difference in OS between pT1N+ patients who underwent upfront surgery alone and ypT1N+ patients treated with NAC (p = 0.159, HR = 1.48, 95% CI: 0.86–2.55) (Figure 7).
Figure 7. Kaplan-Meier plot of pT1N+ patients who underwent upfront surgery and ypT1N+ patients treated with neoadjuvant chemotherapy (NAC).
4 Discussion
Early-stage GC generally exhibits a favorable prognosis, and satisfactory survival outcomes can typically be achieved with surgery alone (22). However, T1 GC patients with lymph node metastasis (T1N+) tend to have significantly worse prognoses and have not received sufficient attention (23). To the best of our knowledge, this is the first study to develop an artificial intelligence (AI)-based predictive model specifically designed to evaluate long-term outcomes in patients with T1N+ GC. Furthermore, the RSF model maintained good discrimination and calibration in the independent validation cohort, supporting its robustness and potential clinical utility.
Although the AJCC TNM staging system is widely used in clinical practice and provides valuable prognostic information, it has certain limitations, particularly in T1N+ GC patients. One key limitation is that N staging only considers PLNs and does not account for ELNs, which have also been shown to influence prognosis (24). Our results further demonstrate that N staging does not adequately distinguish between patient prognoses, this limitation also highlighted in the study by Que et al. (25). To overcome this limitation, we developed a predictive model based on LODDS staging, which incorporates both PLNs and ELNs into the calculation. The results indicated that LODDS staging outperforms N staging in predicting survival outcomes for T1N+ GC patients and identifying high-risk populations. A systematic review by Li et al. confirmed that LODDS is strongly correlated with GC patient prognosis and provides a more accurate prediction of survival than earlier methods. Additionally, the utility of LODDS staging has been validated in studies across various tumor types (26–29).
In the SEER cohort, discrepancies were observed between N staging and LODDS staging, likely attributable to inaccuracies in staging caused by an insufficient number of ELNs. Previous studies and AJCC guidelines recommend examining at least 16 lymph nodes to ensure accurate staging (8, 30). However, there remains ongoing debate regarding whether collecting a greater number of lymph nodes translates into survival benefits. While extensive lymph node dissection can enhance staging precision and guide adjuvant treatment decisions, it may also increase postoperative complications and morbidity. Therefore, determining an optimal range of ELNs is essential for balancing accurate staging with patient safety (31).
This prognostic model, based on the RSF algorithm, was constructed to predict both short-term and long-term survival outcomes in patients with T1N+ GC. The RSF algorithm, first introduced in 2008, has since become a widely accepted method for survival analysis and prognosis prediction (32). Compared with previous Cox regression analysis, RSF algorithm demonstrates superior performance, particularly when handling high-dimensional data (33). In the current study, our RSF model exhibited enhanced calibration and discrimination in predicting 1-, 3-, and 5-year OS for T1N+ GC patients in both the SEER-based training dataset and the validation dataset from our institution, outperforming the Cox regression and XGBoost models. Thus, the RSF model shows considerable promise in enhancing the accuracy and reliability of individualized prognostic predictions.
To overcome the “black-box” challenge of machine learning models, we employed the SHAP method to interpret our RSF model and visualize the contribution of individual predictors (19). The results indicated that LODDS staging was the most important factor, reinforcing its critical prognostic value. Primary tumor site and age at diagnosis ranked second and third, respectively, aligning with findings from previous research. In addition, multivariate Cox analysis confirmed that age and proximal GC serve as independent prognostic factors. Our findings indicate a poorer prognosis in older patients, consistent with observations by Choi et al. (34), which may be due to increased comorbidities associated with treatment and nutritional complications in this population (35, 36). Furthermore, the poor prognosis associated with proximal GC has been widely reported, which may be attributed by its distinct morphology, clinical behavior, and therapeutic responses—suggesting that proximal GC constitutes a relatively independent disease subset (37, 38).
NAC is increasingly utilized in the treatment of GC. However, controversy persists regarding prognostic differences between ypT1 GC patients who underwent NAC and pT1 GC patients who received upfront surgery (39, 40). In this study, we compared the prognoses of these two patient populations and found that the prognosis of patients classified as ypT1 following NAC was comparable to that of patients initially diagnosed at pathological stage pT1. These findings align with previous studies and further support the perspective that ypT1 GC patients generally have favorable clinical outcomes. This result proves the important role of NAC in the treatment of GC patients.
Our study has several limitations that should be acknowledged. First, as a retrospective analysis, it is inherently subject to missing data and selection bias. Second, due to the limited information available in the SEER database, potentially important prognostic variables such as tumor biomarkers (e.g., CEA and CA72-4) were not included in the analysis (41–43). Moreover, detailed treatment-related information, including NAC regimens and treatment durations, were not provided in the SEER database. In addition, some patients had relatively short follow-up durations, although the median follow-up time was adequate for survival estimation. Continued follow-up and data collection from additional centers are warranted to further strengthen the model’s robustness and generalizability. Lastly, given the retrospective nature of this study, prospective multicenter clinical studies are needed to further validate and confirm the clinical applicability of our prognostic model.
In conclusion, this study analyzed the clinical characteristics of patients with T1N+ GC and developed three prognostic models to predict survival outcomes. Among these, the RSF model demonstrated the best predictive performance and validated in an external cohort. Additionally, we identified key prognostic factors for T1N+ GC. Our results indicate that patients diagnosed as ypT1 following neoadjuvant chemotherapy have comparable survival outcomes to those initially staged as pT1.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by ethic committee of the Chinese PLA general hospital. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin because study involved retrospective analysis of de-identified data.
Author contributions
YL: Writing – original draft, Data curation, Conceptualization. HC: Data curation, Conceptualization, Writing – review & editing. ZY: Formal Analysis, Writing – original draft, Software. JW: Data curation, Writing – original draft. RA: Investigation, Writing – original draft, Visualization. RL: Writing – review & editing. JC: Investigation, Supervision, Funding acquisition, Writing – review & editing. BW: Resources, Writing – review & editing, Supervision, Funding acquisition.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the National Natural Science Foundation of China (82073192, 82273231, and 62133010) and the Beijing Science and Technology Program (Z221100007422125).
Acknowledgments
We sincerely thank Ranchu Cheng from the Department of Oncology, University of Oxford, for kindly reviewing and providing valuable suggestions on the statistical analyses of this study. The authors thank all the staff of the National Cancer Institute for their contributions to the SEER program.
Conflict of interest
The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1642302/full#supplementary-material
Supplementary Figure 1 | The correlation heat map of the parameters. Red indicates a positive correlation and blue indicates a negative correlation.
Supplementary Figure 2 | Receiver operating characteristic (ROC)of N staging and log odds of positive lymph nodes (LODDS) staging. “ns” indicates not significant; p < 0.05 (*), p< 0.01 (**), and p< 0.001 (***).
References
1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2024) 74:229–63. doi: 10.3322/caac.21834
2. Conti CB, Agnesi S, Scaravaglio M, Masseria P, Dinelli ME, Oldani M, et al. Early gastric cancer: update on prevention, diagnosis and treatment. Int J Environ Res Public Health. (2023) 20:2149. doi: 10.3390/ijerph20032149
3. Xia JY and Aadam AA. Advances in screening and detection of gastric cancer. J Surg Oncol. (2022) 125:1104–9. doi: 10.1002/jso.26844
4. Tian H, Ning Z, Zong Z, Liu J, Hu C, Ying H, et al. Application of machine learning algorithms to predict lymph node metastasis in early gastric cancer. Front Med. (2022) 8. doi: 10.3389/fmed.2021.759013
5. Chen J, Zhao G, and Wang Y. Analysis of lymph node metastasis in early gastric cancer: a single institutional experience from China. World J Surg Oncol. (2020) 18:57. doi: 10.1186/s12957-020-01834-7
6. Vos EL, Nakauchi M, Gönen M, Castellanos JA, Biondi A, Coit DG, et al. Risk of lymph node metastasis in T1b gastric cancer: an international comprehensive analysis from the global gastric group (G3) alliance. Ann Surgery. (2023) 277:e339–e45. doi: 10.1097/SLA.0000000000005332
7. Wei J, Zhang Y, Liu Y, Wang A, Fan B, Fu T, et al. Construction and validation of a risk-scoring model that preoperatively predicts lymph node metastasis in early gastric cancer patients. Ann Surg Oncol. (2021) 28:6665–72. doi: 10.1245/s10434-021-09867-2
8. Amin MB, Greene FL, Edge SB, Compton CC, Gershenwald JE, Brookland RK, et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA: A Cancer J Clin. (2017) 67:93–9. doi: 10.3322/caac.21388
9. Liu J-Y, Peng C-W, Yang X-J, Huang C-Q, and Li Y. The prognosis role of AJCC/UICC 8th edition staging system in gastric cancer, a retrospective analysis. Am J Trans Res. (2018) 10:292.
10. Sun Z, Xu Y, Li DM, Wang ZN, Zhu GL, Huang BJ, et al. Log odds of positive lymph nodes. Cancer. (2010) 116:2571–80. doi: 10.1002/cncr.24989
11. Deng J, Liu J, Wang W, Sun Z, Wang Z, Zhou Z, et al. Validation of clinical significance of examined lymph node count for accurate prognostic evaluation of gastric cancer for the eighth edition of the American Joint Committee on Cancer (AJCC) TNM staging system. Chin J Cancer Res. (2018) 30:477–91. doi: 10.21147/j.issn.1000-9604.2018.05.01
12. Li Y, Wu G, Liu J, Zhang Y, Yang W, Wang X, et al. Log odds of positive lymph nodes as a novel prognostic predictor for gastric cancer: a systematic review and meta-analysis. BMC Cancer. (2023) 23:523. doi: 10.1186/s12885-023-10805-6
13. Gu P, Deng J, Sun Z, Wang Z, Wang W, Liang H, et al. Superiority of log odds of positive lymph nodes (LODDS) for prognostic prediction after gastric cancer surgery: a multi-institutional analysis of 7620 patients in China. Surg Today. (2021) 51:101–10. doi: 10.1007/s00595-020-02091-7
14. Díaz del Arco C, Estrada Muñoz L, Sánchez Pernaute A, Ortega Medina L, García Gómez de las Heras S, García Martínez R, et al. Prognostic role of the log odds of positive lymph nodes in Western patients with resected gastric cancer: A comparison with the 8th edition of the TNM staging system. Am J Clin Pathology. (2023) 161:186–96. doi: 10.1093/ajcp/aqad119
15. Che K, Wang Y, Wu N, Liu Q, Yang J, Liu B, et al. Prognostic nomograms based on three lymph node classification systems for resected gastric adenocarcinoma: A large population-based cohort study and external validation. Ann Surg Oncol. (2021) 28:8937–49. doi: 10.1245/s10434-021-10299-1
16. Surveillance, Epidemiology, and End Results (SEER) Program. SEER*Stat database: incidence—SEER research data, 17 registries, nov 2023 sub (2000–2021). Bethesda MNCI: DCCPS, Surveillance Research Program (2024). Available online at: https://seer.cancer.gov (Accessed March 1, 2025).
17. Wang FH, Zhang XT, Tang L, Wu Q, Cai MY, Li YF, et al. The Chinese Society of Clinical Oncology (CSCO): Clinical guidelines for the diagnosis and treatment of gastric cancer, 2023. Cancer Commun (Lond). (2024) 44:127–72. doi: 10.1002/cac2.12516
18. Sonabend R, Király FJ, Bender A, Bischl B, and Lang M. mlr3proba: an R package for machine learning in survival analysis. Bioinformatics. (2021) 37:2789–91. doi: 10.1093/bioinformatics/btab039
19. Lundberg SM and Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. (2017) 30:4768–77. doi: 10.48550/arXiv.1705.07874.
20. Lee J and Little TD. A practical guide to propensity score analysis for applied clinical research. Behav Res Ther. (2017) 98:76–90. doi: 10.1016/j.brat.2017.01.005
21. Ho D, Imai K, King G, and Stuart EA. MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Software. (2011) 42:1–28. doi: 10.18637/jss.v042.i08
22. Yagi S, Nunobe S, Makuuchi R, Ida S, Kumagai K, Ohashi M, et al. Oncological outcomes in patients with pT1N0–3 or pT2–3N0 gastric cancer after curative resection without adjuvant chemotherapy. Langenbeck’s Arch Surgery. (2021) 406:419–26. doi: 10.1007/s00423-021-02084-1
23. Yura M, Yoshikawa T, Otsuki S, Yamagata Y, Morita S, Katai H, et al. Is surgery alone sufficient for treating T1 gastric cancer with extensive lymph node metastases? Gastric Cancer. (2020) 23:349–55. doi: 10.1007/s10120-019-01006-x
24. MingHua Z, KeCheng Z, ZhenYu C, Lin C, ChunXi W, and ZeLong Y. Impact of lymph nodes examined on survival in ypN0 gastric cancer patients: a population-based study. J Gastrointestinal Surgery. (2021) 25:919–25. doi: 10.1007/s11605-020-04579-6
25. Que S-J, Zhong Q, Chen Q-Y, Truty MJ, Yan S, Ma Y-B, et al. A novel ypTLM staging system based on LODDS for gastric cancer after neoadjuvant therapy: multicenter and large-sample retrospective study. World J Surgery. (2023) 47:1. doi: 10.1007/s00268-023-06994-7
26. Sun Z, Xu Y, Li de M, Wang ZN, Zhu GL, Huang BJ, et al. Log odds of positive lymph nodes: a novel prognostic indicator superior to the number-based and the ratio-based N category for gastric cancer patients with R0 resection. Cancer. (2010) 116:2571–80. doi: 10.1002/cncr.24989
27. Meng X, Hao F, Wang N, Qin P, Ju Z, and Sun D. Log odds of positive lymph nodes (LODDS)-based novel nomogram for survival estimation in patients with invasive micropapillary carcinoma of the breast. BMC Med Res Methodol. (2024) 24:90. doi: 10.1186/s12874-024-02218-1
28. He C, Ni M, Liu J, Teng X, Ke L, Matsuura Y, et al. A survival nomogram model for patients with resectable non-small cell lung cancer and lymph node metastasis (N1 or N2) based on the Surveillance, Epidemiology, and End Results Database and single-center data. Transl Lung Cancer Res. (2024) 13:573–86. doi: 10.21037/tlcr-24-119
29. Ogawa S, Itabashi M, Bamba Y, Yamamoto M, and Sugihara K. Superior prognosis stratification for stage III colon cancer using log odds of positive lymph nodes (LODDS) compared to TNM stage classification: the Japanese study group for postoperative follow-up of colorectal cancer. Oncotarget. (2020) 11:3144–52. doi: 10.18632/oncotarget.27692
30. Ghukasyan R, Banerjee S, Childers C, Labora A, McClintick D, Girgis M, et al. Higher numbers of examined lymph nodes are associated with increased survival in resected, treatment-naïve, node-positive esophageal, gastric, pancreatic, and colon cancers. J Gastrointestinal Surgery. (2023) 27:1197–207. doi: 10.1007/s11605-023-05617-9
31. Lin G-T, Chen Q-Y, Zhong Q, Zheng C-H, Li P, Xie J-W, et al. Intraoperative surrogate indicators of gastric cancer patients’ Long-term prognosis: the number of lymph nodes examined relates to the lymph node noncompliance rate. Ann Surg Oncol. (2020) 27:3281–93. doi: 10.1245/s10434-020-08387-9
32. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, and Lau BM. Random survival forests for competing risks. Biostatistics. (2014) 15:757–73. doi: 10.1093/biostatistics/kxu010
33. Taylor JM. Random survival forests. J Thorac Oncol. (2011) 6:1974–5. doi: 10.1097/JTO.0b013e318233d835
34. Choi Y, Kim N, Kim KW, Jo HH, Park J, Yoon H, et al. Gastric cancer in older patients: A retrospective study and literature review. Ann Geriatr Med Res. (2022) 26:33–41. doi: 10.4235/agmr.21.0144
35. Kawaguchi Y, Akaike H, Shoda K, Furuya S, Hosomura N, Amemiya H, et al. Is surgery the best treatment for elderly gastric cancer patients? World J Gastrointest Surg. (2021) 13:1351–60. doi: 10.4240/wjgs.v13.i11.1351
36. Yamamoto K, Nagatsuma Y, Fukuda Y, Hirao M, Nishikawa K, Miyamoto A, et al. Effectiveness of a preoperative exercise and nutritional support program for elderly sarcopenic patients with gastric cancer. Gastric Cancer. (2017) 20:913–8. doi: 10.1007/s10120-016-0683-4
37. Bornschein J, Dingwerth A, Selgrad M, Venerito M, Stuebs P, Frauenschlaeger K, et al. Adenocarcinomas at different positions at the gastro-oesophageal junction show distinct association with gastritis and gastric preneoplastic conditions. Eur J Gastroenterol Hepatol. (2015) 27:492–500. doi: 10.1097/MEG.0000000000000299
38. Imamura Y, Watanabe M, Oki E, Morita M, and Baba H. Esophagogastric junction adenocarcinoma shares characteristics with gastric adenocarcinoma: Literature review and retrospective multicenter cohort study. Ann Gastroenterol Surg. (2021) 5:46–59. doi: 10.1002/ags3.12406
39. Li Z, Wang Y, Ying X, Shan F, Wu Z, Zhang L, et al. Different prognostic implication of ypTNM stage and pTNM stage for gastric cancer: a propensity score-matched analysis. BMC Cancer. (2019) 19:80. doi: 10.1186/s12885-019-5283-3
40. Prasad P, Sivaharan A, Navidi M, Fergie BH, Griffin SM, and Phillips AW. Significance of neoadjuvant downstaging in gastric adenocarcinoma. Surgery. (2022) 172:593–601. doi: 10.1016/j.surg.2022.03.005
41. Kochi M, Fujii M, Kanamori N, Kaiga T, Kawakami T, Aizaki K, et al. Evaluation of serum CEA and CA19–9 levels as prognostic factors in patients with gastric cancer. Gastric Cancer. (2000) 3:177–86. doi: 10.1007/PL00011715
42. Lin J-P, Lin J-X, Ma Y-B, Xie J-W, Yan S, Wang J-B, et al. Prognostic significance of pre- and post-operative tumour markers for patients with gastric cancer. Br J Cancer. (2020) 123:418–25. doi: 10.1038/s41416-020-0901-z
Keywords: early gastric cancer, lymph node metastasis, LODDS staging, machine learning, prognosis
Citation: Liu Y, Cui H, Yuan Z, Wang J, An R, Li R, Cui J and Wei B (2026) Machine learning models based on log odds of positive lymph nodes for predicting survival in T1N+ gastric cancer. Front. Oncol. 15:1642302. doi: 10.3389/fonc.2025.1642302
Received: 06 June 2025; Accepted: 17 December 2025; Revised: 11 November 2025;
Published: 09 January 2026.
Edited by:
Zhen Liu, Zhejiang University, ChinaReviewed by:
Petar Ozretić, Rudjer Boskovic Institute, CroatiaKangping Yang, Second Affiliated Hospital of Nanchang University, China
Copyright © 2026 Liu, Cui, Yuan, Wang, An, Li, Cui and Wei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Bo Wei, d2VpYm9AMzAxaG9zcGl0YWwuY29tLmNu; Jianxin Cui, Y3VpanhfZG9jdG9yQDE2My5jb20=
†These authors have contributed equally to this work and share first authorship