The development and validation of automated machine learning models for predicting lymph node metastasis in Siewert type II T1 adenocarcinoma of the esophagogastric junction

Background Lymph node metastasis (LNM) is considered an essential prognosis factor for adenocarcinoma of the esophagogastric junction (AEG), which also affects the treatment strategies of AEG. We aimed to evaluate automated machine learning (AutoML) algorithms for predicting LNM in Siewert type II T1 AEG. Methods A total of 878 patients with Siewert type II T1 AEG were selected from the Surveillance, Epidemiology, and End Results (SEER) database to develop the LNM predictive models. The patients from two hospitals in Suzhou were collected as the test set. We applied five machine learning algorithms to develop the LNM prediction models. The performance of predictive models was assessed using various metrics including accuracy, sensitivity, specificity, the area under the curve (AUC), and receiver operating characteristic (ROC) curve. Results Patients with LNM exhibited a higher proportion of male individuals, a poor degree of differentiation, and submucosal infiltration, with statistical differences. The deep learning (DL) model demonstrated relatively good accuracy (0.713) and sensitivity (0.868) among the five models. Moreover, the DL model achieved the highest AUC (0.781) and sensitivity (1.000) in the test set. Conclusion The DL model showed good predictive performance among five AutoML models, indicating the advantage of AutoML in modeling LNM prediction in patients with Siewert type II T1 AEG.

AEG is commonly considered a separate digestive tract tumor (9)(10)(11)(12).The Siewert classification categorizes AEG into three types based on the location of the tumor epicenter relative to the gastroesophageal junction (GEJ) (13)(14)(15).In Siewert type I AEG, the epicenter of the tumor is located 1 to 5 cm above the GEJ.For type II, the epicenter of the tumor is located 1 cm above to 2 cm below the GEJ.For type III, the epicenter of the tumor is located 2 to 5 cm below the GEJ.Among the three subtypes, Siewert type II is generally considered the true cardia carcinoma (13).
Due to its particular anatomical location, the treatment of Siewert type II AEG has been historically complicated.For locally advanced tumors, radical surgical resection is still the primary treatment for AEG (5,11).However, with gastrointestinal endoscopy screening, patients with digestive tract cancer are diagnosed at an early stage, making it possible to treat early AEG without lymphatic and organ metastasis by endoscopy.The endoscopic resection of superficial AEG, such as endoscopic mucosal resection (EMR) and endoscopic submucosal dissection (ESD), is considered safe and effective (16)(17)(18)(19)(20)(21).Endoscopic resection techniques are increasingly being employed for early AEG, leading to a reduction in the morbidity and mortality associated with gastrectomy or esophagectomy and an improvement in the quality of life (16).
Previous studies have shown lymph node metastasis (LNM) as an independent prognostic factor for AEG (9,22,23).In addition, some studies have constructed the prediction models for LNM of AEG using the traditional logistic regression method (24)(25)(26)(27).However, machine learning-based models are increasingly used in the diagnosis, prediction, and prognosis evaluation of gastrointestinal diseases, such as inflammatory bowel disease and gastrointestinal tumors (28)(29)(30)(31).In this study, we aimed to establish predictive models for LNM in Siewert type II T1 AEG using automated machine learning (AutoML) methods to help clinicians assess the availability of endoscopic treatment and individualize a suitable treatment for patients.

Data source
Relevant data from the SEER database were retrieved in our study.The SEER database of the National Cancer Institute, an authoritative source of information on cancer incidence and survival, contains data on various tumor sites and from sources throughout the United States. 1   1 https://seer.cancer.gov/Currently, the SEER program collects and releases cancer data from 17 population-based registries, covering approximately one-third of the U.S. population, which can be used to conduct population-based case-control studies that clarify the etiology of cancers, especially some uncommon ones (32,33).By using SEER * Stat 8.4.0.1 software, we obtained demographic information and cancer incidence data collected from the SEER 17 Registries, November 2021 Sub (2000-2019 varying).To identify Siewert type II AEG, we used two parameters in the SEER database.Cancers simultaneously satisfying two conditions ["TNM 7/CS v0204 + Schema" encoded 28 (Esophagus GE Junction) and "Primary Site-Labeled" encoded 160 (Cardia, NOS)] were extracted and classified as Siewert type II AEG (4,34).
In addition, the patients with Siewert type II T1 AEG diagnosed in the First Affiliated Hospital of Soochow University and the Second Affiliated Hospital of Soochow University from April 2003 to October 2022 were retrospectively selected as the research subjects.

Including criteria
The criteria for patient inclusion were as follows: (1) patients with available TNM stage information; (2) patients aged 18 years or above at diagnosis (in consideration of the tiny proportion of patients under 18 years); (3) patients pathologically diagnosed as T1M0 Siewert Type II AEG; (4) patients with the first or only primary malignancy; and (5) patients with available information on differentiation, extension, and size.

Modeling methods and evaluation
In this study, five algorithms-generalized linear model (GLM), gradient boosting machine (GBM), deep learning (DL), distributed random forest (DRF), and stacked ensemble (SE)-were provided by H2O 2 to construct prediction models using the training set.Using the h2o package (version 3.38.0.4) in R, we set the response column and the predictor columns for the training, validation, and test sets, respectively.The H2O AutoML performs a hyperparameter search using a random grid search method over the five algorithms to deliver the best model automatically.Five predictive models were finally developed for this approach.We used the validation and test sets to score and rank models.
The models' accuracy, misclassification, specificity, sensitivity, and precision (also named positive predictive value) were obtained by plotting the confusion matrix.To select the best model, the difference between the predicted and actual results was analyzed.The predictive ability of the models was evaluated using the receiver operating 2 https://h2o.ai/characteristics (ROC) curve and the area under the curve (AUC).The procedure of patient selection and modeling is shown in Figure 1.
We selected a model with the best performance using the above indicators and further evaluated it with the calibration curve.The Brier score, a statistical metric to measure the accuracy of probabilistic forecasts, was used to assess the calibration degree of the models.The score ranges from 0 to 1; a model with perfect skill has a score of 0, and the poorest model has a score of 1 (36).The unreliability index and the p-value of the calibration curve were also used to evaluate the reliability of the model.
Finally, the results of the model are presented visually for better understanding.A variable importance plot was constructed to show the importance of different variables.A Local Interpretable Model-Agnostic Explanations (LIME) Feature Importance Visualization plot was constructed using the lime package (version 0.5.3) in R to show the contributions of variables of samples to the outcome.

Statistical analysis
The statistical analysis and the modeling process were performed using R software.The package, tableone (version 0.13.2), in R was used in data analysis.We compared the baseline information and characteristics between different groups, including demographic and clinicopathological data.The normality of the quantitative data was evaluated using the Kolmogorov-Smirnov test.When the quantitative variables were normally distributed, they were represented by the mean and standard deviation (SD).However, they were represented by the median and interquartile range (IQR) when they were not.Student's t-test was used for intergroup comparison of normally distributed quantitative variables, and the Mann-Whitney U-test was employed to compare non-normally distributed quantitative variables.The classification data were expressed by frequency and percentage, and the chi-squared (χ2) test was used for intergroup comparison.

Characteristics
A total of 867 patients from the SEER dataset and 141 patients from Suzhou were screened for our study.After balancing the SEER dataset, 1,329 samples were collected in the SEER dataset.The patients' demographics and clinicopathological baseline information in the SEER dataset after balancing and the test set are summarized, respectively, in Table 1.The baseline characteristic information based on LNM in the SEER dataset after balancing and the test set is summarized in Tables 2, 3, respectively.In the SEER dataset, the median sizes of the LNM For the differentiation variable, well-differentiated is defined as 1, moderately differentiated is defined as 2, and poorly differentiated or undifferentiated is defined as 3. 2 The variable extension refers to the depth of tumor invasion. 3For the LNM variable, those with lymph node metastases are defined as 1; the rest are defined as 0.
and non-LNM groups were 25.00 and 15.00 mm, respectively, with statistical significance (p < 0.001).Patients with LNM had a higher proportion of poor degree of differentiation and submucosal infiltration, with statistical differences.In the test set, the average ages of the LNM and non-LNM groups were 65.00 and 67.15 years, respectively, with no statistical significance (p = 0.344).Patients with LNM had a higher proportion of poor degree of differentiation, with a p-value of 0.051.Patients in the two groups showed statistical differences in the depth of tumor invasion.
The clinicopathological features of the training set and the validation set obtained from the SEER dataset are presented in Table 4.In the balanced training set and validation set, the rate of positive events, just the LNM rate in the study, was 49.7% and 50%, respectively.There were no significant differences in gender, degree of differentiation, depth of tumor invasion, or tumor size between the two groups.

Performance of the models
To calculate the accuracy, sensitivity, specificity, and other indicators of the models in the validation set and the test set, the confusion matrices of the models are shown in Figures 2, 3, respectively.These indicators and the AUC of the five different models in the validation and test sets are shown in Table 5.The DL model has good sensitivity (0.868) and accuracy (0.713) in the validation set, which means that the model can accurately identify patients with positive lymph node metastases.The DL model exhibited a sensitivity of 100% in the test set, indicating that the model was able to screen out node-positive patients well and reduce missed diagnoses.The GBM model achieved an accuracy of 0.763, a sensitivity of 0.821, and a specificity of 0.705 in the validation set.The sensitivity (0.700) of the model in the test is lower than that of the DL model.The confusion matrix revealed that the model failed to correctly predict three patients with positive lymph nodes in the test set.Although the GLM model For the differentiation variable, well-differentiated is defined as 1, moderately differentiated is defined as 2, and poorly differentiated or undifferentiated is defined as 3. 2 The variable extension refers to the depth of tumor invasion.exhibited the highest sensitivity in the validation set (0.916) and the test set (1.000), its specificity was lower than other models.This aspect suggests that the GLM model was less capable of predicting negative LNM.
The ROC curve and AUC can evaluate the predictive ability of the models.Figures 4A,B show the ROC curves of the five models in the validation and test sets, respectively.The DL model achieved a good AUC (0.769) in the validation set, and it exhibited the highest AUC in the test set compared to other models.The Matthews correlation coefficient (MCC) score is a commonly used metric for evaluating binary classification models.The DL model achieved a MCC score of 0.448.The MCC scores of the five predictive models on the test set are not high, which may be related to the small size of the test set.
Considering that the predictive model is a preoperative screening model, sensitivity should have a high weight on the selection of models.Hence, we believe that the DL model is the best model for predicting LNM in patients with AEG, with high sensitivity and reasonable specificity.The DL model consists of an input layer, two hidden layers, and an output layer.Dropout is applied in both hidden layers at a rate of 30%, which helps to prevent overfitting by randomly dropping out a percentage of units during modeling.The regularization terms are set to 0 for all layers.Rectifier (ReLU) activation functions are used in the hidden layers, while Softmax activation is used in the output layer for classification.

The performance of the deep learning model 3.3.1 Calibration curve in the datasets
The calibration curves are shown in Figures 5A,B, which is another way to evaluate the model.The calibration curve of the DL model in the validation set shows a high degree of fit.The Brier scores of the DL model in the validation and test sets were 0.213 and 0.228, respectively, indicating that the prediction results of the model were in good agreement with the actual outcome.The unreliability index of the model in the validation set was 0.070, which suggests that the DL model is reliable for predicting the LNM in T1 Siewert type II AEG patients.

Model visualization
The variable importance in the DL model is shown in Figure 6.According to this figure, tumor size is the most important predictor of LNM in T1 Siewert type II patients.Furthermore, we randomly selected four cases to plot the LIME feature importance visualization, as shown in Figure 7. Take the first case as an example, the tumor is moderately differentiated, and the white male married patient supports the lymph node without metastasis.In the third case, male married patients with moderately differentiated tumors contradict the result of LNM.However, other parameters, such as tumor infiltration For the differentiation, well-differentiated is defined as 1, moderately differentiated is defined as 2, and poorly differentiated or undifferentiated is defined as 3. 2 The variable extension refers to the depth of tumor invasion. 10.3389/fmed.2024.1266278 Frontiers in Medicine 07 frontiersin.orginto the submucosa and patients from Asian or Pacific regions, support LNM.Under the comprehensive prediction, the probability of LNM in this patient was 61%.That is, the prediction result is the same as the actual outcome.

Discussion
In the study, we found that differentiation, the depth of invasion, the size of the AEG, and gender were related to LNM.Five predictive models were developed using AutoML.Among these models, the DL model is the most suitable for predicting and screening LNM in early AEG, with the highest sensitivity and AUC in the test set.
With the incidence rates rising, a series of problems in the treatment and prognosis of AEG have been gradually becoming global concerns.With the application of endoscopic screening technology, patients with digestive tract cancer (including AEG) are diagnosed at an early stage, making endoscopic treatment of superficial AEG possible.Because of the inherent differences in the anatomy of AEG, there are certain technical difficulties in treating AEG with ESD (37).Chen et al. (38) found that the procedure speed of ESD for early AEG is slower than that for early gastric carcinoma, possibly due to AEG extending beyond the cardia, including the angle of His.However, endoscopic treatment (including ESD) remains an effective alternative to surgery for the treatment of early AEG based on comparable longterm outcomes (18,20).With the advancements in endoscopic For the differentiation variable, well-differentiated is defined as 1, moderately differentiated is defined as 2, and poorly differentiated or undifferentiated is defined as 3. 2 The variable extension refers to the depth of tumor invasion. 3For the LNM variable, those with lymph node metastases are defined as 1; the rest are defined as 0. 10.3389/fmed.2024.1266278 Frontiers in Medicine 08 frontiersin.orgtreatment, early AEG can be effectively resected by EMR/ESD with fewer complications, better preservation of gastric function, a shorter duration of hospital stay, and a lower cost compared with traditional gastrectomy or esophagectomy (16,17).Endoscopic treatment is gaining acceptance because it is more tolerable, especially in elderly patients.Chen et al. (20) revealed that endoscopic treatment may be considered in patients aged 65 years or those with submucosal (T1b stage) cancer of the AEG.It has been widely accepted that LNM is an important prognostic factor for patients with Siewert type II AEG.In the study of Wang et al. (22), a prognostic model for the outcome of patients with AEG based on a traditional algorithm was established.The positive lymph nodes and the ratio of metastatic lymph nodes were identified as two of the prognostic factors according to the univariate analysis.Naoki et al. (23) found that LNM was the only independent prognostic factor for AEG in their study.To achieve better endoscopic treatment effects, early AEG should meet certain standards (including no lymph node and distant organ metastasis) (12,16,18).In some studies, the location of the LNM has been found to have a significant impact on the surgical method and the scope of lymph node dissection (39-41).
Preoperative diagnosis of LNM mainly relies on computed tomography (CT), endoscopic ultrasound, and magnetic resonance imaging (MRI), which are primarily based on the size of the lymph nodes.The preoperative prediction of LNM using the CT criteria has high specificity (23).However, the diagnostic accuracy of LNM prediction using these methods is not particularly high, as the evaluation of the lymph node size is greatly affected by other factors and thus heavily relies on the physician's evaluation (26).In addition, detecting LNM in a narrow space (such as the diaphragm, aorta, and pericardium) by contrast-enhanced CT before surgery is more complex than in lymph nodes around the stomach or colon (23).Moreover, not all patients have access to contrast-enhanced CT for diagnosing clinical LNM.
Several studies have constructed LNM predictive models of AEG until now.However, most studies only used traditional logistic regression analysis for risk factors and did not perform independent external validation.Chen et al. (25) used the logistic regression method to predict the LNM risk in early AEG patients, and the AUC of the prediction model is 0.742.Feng et al. (26) provided a detailed explanation of the correlation between tumor size and LNM in AEG and used logistic regression to plot a nomogram, which can predict the LNM risk.Zheng et al. (24) used small samples to explore the risk factors for LNM in AEG while showing the specific groups of lymph nodes.All of these studies are consistent with our findings, but the predictive performance of their models is weaker than that of ours, indicating that machine learning has good advantages in the establishment of LNM predictive models.
In the present study, a SEER-based case-control analysis has been conducted.We found that most AEG patients (approximately 71.2%) with LNM had a submucosal invasion.Approximately 53.9% of patients without LNM had submucosal infiltration.Given the key role of LNM in the selection of endoscopic or surgical resection, we built predictive models of LNM using AutoML methods on data from the SEER database and validated the models with independent data.Among the five models, the DL model is highly sensitive to predict LNM in early AEG patients.A consistent performance of our new DL model across the datasets with different baseline characteristics provides evidence of its robustness and generalizability.
In our study, we found that the degree of tumor differentiation, the depth of tumor invasion, and gender were related to LNM.Lower degrees of differentiation have higher incidences of LNM.Higher incidences of LNM are observed in less differentiated tumors due to higher heterogeneity and more aggressive biological characteristics compared to other histological types.The risk of LNM is higher for  AEG that invades the submucosa.The reason behind this observation may be due to the presence of substantial lymphatic capillaries in the submucosa and the large gap between adjacent endothelial cells.If the tumor infiltrates the submucosa or deeper, cancer cells could invade the lymphatic capillaries, resulting in LNM (24).In terms of gender, the exact reason remains unknown.However, several studies have shown more prolonged survival in female individuals than in male counterparts with esophageal cancer (42)(43)(44)(45), which is attributed to both sex itself (sex hormones and reproductive factors) and other extrinsic risk factors (43).
The predictive model we have established can help clinicians predict the LNM risk of early AEG while combining imaging findings, thus helping us make better clinical decisions and personalized treatment plans for early AEG patients.Of note, the prediction model was developed using postoperative pathological data, which can also be obtained from endoscopically resected pathological specimens.Hence, we investigated LNM predictive models for T1 AEG.However, certain limitations still exist in our study.First, the inherent limitations of retrospective and non-randomized studies may lead to unavoidable bias.Second, the prediction model was based on postoperative pathological data, and therefore, further studies combined with preoperative data are needed to validate our model.Third, the patient data from AEG were collected from two hospitals in Suzhou to validate our prediction models.Due to the low rate of LNM in the population, the test set is highly imbalanced, with positive cases representing less than 10% of cases, which makes it less credible to validate the models' predictive performance in the test set.That is why the precision of the model in the test set is not ideal, which means data  from different hospitals in different regions need to be further collected to expand the sample size.Lymphovascular invasion has been repeatedly demonstrated as the most crucial risk factor for LNM (46).The esophageal invasion length is thought to be associated with mediastinal LNM (39, 47, 48).However, due to the limited data available in the SEER database, certain tumor characteristics (such as lymphovascular invasion, esophageal invasion length, and the groups of LNM), blood index, and imaging data were missing.Therefore, we cannot further improve the performance of the LNM predictive model in a multimodal way.Variable importance in the DL model.In this figure, "Race.AI" means patients' race is American Indian; "Race.W" means patients' race is white; "Race.B" means patients' race is Black; "Differentiation.1"means well-differentiated tumors; "Differentiation.3"means poorly differentiated tumors.

Conclusion
In summary, in this multicenter-based case-control study, we report that the degree of tumor differentiation, tumor size, gender, and depth of tumor invasion are correlated with the LNM of Siewert type II T1 AEG.Using AutoML algorithms, we built five models to predict LNM in the early AEG.The DL model is the best model for predicting LNM in patients with AEG, with high sensitivity and reasonable specificity.This model should be further applied in clinical practice, and the predictive performance of this model should be prospectively explored in further clinical follow-up.

FIGURE 1 Flowchart
FIGURE 1Flowchart of patient selection and modeling procedure.

FIGURE 2
FIGURE 2 Confusion matrices of five models in the validation set.In this figure, 1 of target represents lymph node metastasis in the population, while 1 of prediction represents the positive prediction of lymph node metastasis by the model.(A) Confusion matrix of the DL model in the validation set.(B) Confusion matrix of the GBM model in the validation set.(C) Confusion matrix of the SE model in the validation set.(D) Confusion matrix of the DRF model in the validation set.(E) Confusion matrix of the GLM model in the validation set.

FIGURE 3
FIGURE 3 Confusion matrices of five models in the test set.In this figure, 1 of target represents lymph node metastasis in the population, while 1 of prediction represents the positive prediction of lymph node metastasis by the model.(A) Confusion matrix of the DL model in the test set.(B) Confusion matrix of the GBM model in the test set.(C) Confusion matrix of the SE model in the test set.(D) Confusion matrix of the DRF model in the test set.(E) Confusion matrix of the GLM model in the test set.

FIGURE 4 ROC
FIGURE 4 ROC of predictive models in the sets.(A) ROC of predictive models in the validation set.(B) ROC of predictive models in the test set.

FIGURE 5
FIGURE 5 Calibration curves of the DL model in the sets.(A) The calibration curve of the DL model in the validation set.(B) The calibration curve of the DL model in the test set.

FIGURE 7 LIME
FIGURE 7 LIME feature importance visualization.In this figure, "Race = W" means patients' race is white; "Race = API" means patients' race is Asian or Pacific Islander; "Differentiation = 2" means tumor is moderately differentiated.

TABLE 1
Baseline characteristics of patients from the SEER dataset after balancing and the test set.

TABLE 2
Clinicopathological characteristics of patients from the SEER dataset after balancing.

TABLE 3
Clinicopathological characteristics of patients from the test set.

TABLE 4
Clinicopathological characteristics of patients from the training set and the validation set after balancing.

TABLE 5
Performance of models in the dataset.