An artificial intelligence prediction model outperforms conventional guidelines in predicting lymph node metastasis of T1 colorectal cancer

Background According to guidelines, a lot of patients with T1 colorectal cancers (CRCs) undergo additional surgery with lymph node dissection after being treated by endoscopic resection (ER) despite the low incidence of lymph node metastasis (LNM). Aim The aim of this study was to develop an artificial intelligence (AI) model to more effectively identify T1 CRCs at risk for LNM and reduce the rate of unnecessary additional surgery. Methods We retrospectively analyzed 651 patients with T1 CRCs. The patient cohort was randomly divided into a training set (546 patients) and a test set (105 patients) (ratio 5:1), and a classification and regression tree (CART) algorithm was trained on the training set to develop a predictive AI model for LNM. The model used 12 clinicopathological factors to predict positivity or negativity for LNM. To compare the performance of the AI model with the conventional guidelines, the test set was evaluated according to the Japanese Society for Cancer of the Colon and Rectum (JSCCR) and National Comprehensive Cancer Network (NCCN) guidelines. Finally, we tested the performance of the AI model using the test set and compared it with the JSCCR and NCCN guidelines. Results The AI model had better predictive performance (AUC=0.960) than the JSCCR (AUC=0.588) and NCCN guidelines (AUC=0.850). The specificity (85.8% vs. 17.5%, p<0.001), balanced accuracy (92.9% vs. 58.7%, p=0.001), and the positive predictive value (36.3% vs. 9.0%, p=0.001) of the AI model were significantly better than those of the JSCCR guidelines and reduced the percentage of the high-risk group for LNM from 83.8% (JSCCR) to 20.9%. The specificity of the AI model was higher than that of the NCCN guidelines (85.8% vs. 82.4%, p=0.557), but there was no significant difference between the two. The sensitivity of the NCCN guidelines was lower than that of our AI model (87.5% vs. 100%, p=0.301), and according to the NCCN guidelines, 1.2% of the 105 test set patients had missed diagnoses. Conclusion The AI model has better performance than conventional guidelines for predicting LNM in T1 CRCs and therefore could significantly reduce unnecessary additional surgery.


Introduction
T1 colorectal cancer (T1 CRCs) with superficial submucosal invasion can be cured by endoscopic resection (ER); therefore, effective screening for LNM is very important.High-grade histology, lymphovascular infiltration (LV), depth of submucosal invasion (DSI) ≥1000 mm, and tumor budding are considered risk factors for lymph node metastasis (LNM) in most guidelines (1)(2)(3)(4).According to those guidelines, approximately 57% to 91% of patients with T1 CRCs are classified in a high-risk group for LNM, while the rate of LNM is only 3.3% to 4.7% (5)(6)(7)(8); thus, effective screening of high-risk patients will reduce the large number of patients who receive unnecessary surgery (5,8).
According to guidelines, patients with any one histologic risk factor in the pathologic examination will be classified into the highrisk group for LNM and be recommended for additional surgery (1)(2)(3).However, even those patients who are classified into the highrisk group have varying degrees of LNM risk, as each patient has different types and combinations of risk factors (8)(9)(10)(11)(12).Therefore, it is necessary to establish a more effective method to predict the probability of LNM and choose the best treatment plan based on the degree of risk.Artificial intelligence (AI) models can simultaneously consider multiple risk factors and ultimately predict the risk of LNM.In this study, we aimed to develop an AI prediction model to more accurately identify patients with a high risk of LNM.

Study cohort
This retrospective study collected data from patients who had been diagnosed with T1 CRCs and underwent either ER or radical surgery at six large general hospitals in Ningbo, China, from 2016 to 2022.The exclusion criteria were as follows: 1) patients with any history of malignant gastrointestinal tumors; 2) patients with any history of advanced malignant tumors; 3) patients who underwent ER alone and were lost to follow-up or completed less than four years of follow-up after ER; 4) patients who received any adjuvant therapy; 5) patients with familial adenomatous polyposis; and 6) patients that could not be evaluated due to poor quality of pathological specimens.Ultimately, a total of 651 cases were enrolled in this study, including cases of ER alone(77 cases), additional surgery after ER(221 cases), and initial radical surgery (353 cases).Operative specimens were used as the gold standard for the presence of LNM, and patients who underwent ER alone with no evidence of recurrence during the follow-up period (≥4 years) were regarded as negative for LNM.Clinical decisions and management followed the "Guidelines of Chinese Society of Clinical Oncology, Colorectal Cancer" in all six hospitals.Followup in these patients was performed according to routine clinical care protocols.The overall scheme of the workflow is illustrated in Figure 1.

Assessment of clinicopathological factors
We collected the age, sex, tumor size, location, and morphology and operation records of each patient from the hospital records.Lymph node status was obtained directly from the examination of radical surgical specimens or follow-up visits.All factors used to develop the model were re-evaluated in all slides by one experienced digestive pathologist who conducted a professor consultation when there was disagreement with the original pathological diagnosis.The evaluated histologic factors included histologic differentiation, lymphovascular invasion (LV), tumor budding, poorly differentiated clusters (PDCs), width of submucosal invasion (WSI), depth of submucosal invasion (DSI), and area of submucosal invasion (ASI).WSI was measured at the widest part of the range of submucosal invasion.The method for measuring DSI was as follows regardless of the morphology: when the muscularis mucosae could be identified or estimated, DSI was measured from the lower border of the muscular mucosae; when the muscularis mucosae could not be identified or estimated, DSI was measured from the surface layer of the lesion, but the residual adenoma components on the surface of the lesion were not included.In this study, we defined the product of the DSI and WSI as the area of submucosal invasion (ASI) Figure 2. Histologic subtype and grade were evaluated based on the World Health Organization Classification of Tumors.Histological differentiation was divided into well-to moderately differentiated adenocarcinoma, poorly differentiated adenocarcinoma (POR), and well-to moderately differentiated adenocarcinoma with an obvious mucinous adenocarcinoma  component (≥30%).Tumor budding was defined as a cancer cell nest consisting of one or fewer than five cells that infiltrated the interstitium at the invasive margin of the cancer, and PDCs were defined as cancer clusters of ≥5 cancer cells infiltrating the stroma and lacking glandular formation.Tumor budding and PDCs were evaluated according to the Japanese Society for Cancer of the Colon and Rectum (JSCCR) guidelines and Ueno's methods, respectively (13,14).When LV infiltration and DSI were difficult to judge, cases were evaluated by immunostaining with antibodies against CD34, CD31, D2-40, and Desmin, and cases that were still difficult to assess after immunohistochemistry due to poor specimen quality were excluded.

Development of the AI model
The cohort of 651 patients was randomly divided into a training set and a test set (ratio 5:1), and the LightGBM algorithm was used to train the 546 patients in the training set to develop a predictive model for LNM.We performed 5-fold cross validation as follows: for each set of parameters, we fitted the model to 4/5 of the data, used 1/5 of the data for internal validation, and then rotated the data through the validation set 1/5 at a time until every data point had been part of the validation set once.The AI model was developed using the following 12 factors: age, sex, location, tumor size, tumor morphology, LV, histologic features, tumor budding, PDCs, WSI, DSI and ASI.Age, tumor size, WSI, DSI, and ASI were recorded as continuous variables.Missing values were imputed using the mice package and filled by predictive mean matching.The permutation feature importance method was selected during the feature selection process.Each feature was randomly arranged, and the model was evaluated on the perturbed data.Finally, the performance difference between the model on the original data and the perturbed data was compared.The ratio of LNM+ and LNM-cases was 59:487 in the training set.We did not perform balanced processing and did not weigh the prediction results of the model.We used LightGBM's built-in function to obtain the feature importance score, and to more intuitively reflect the importance of each feature, we normalized it.We conducted hyperparameter debugging in the validation set, and based on the experience of AI model developers, we tested the model's performance on the test set.The threshold was selected in steps of 0.01 starting from 0 for validation, and fine-tuning was performed on the optimal performance interval of the model.

Assessment of the AI model
To compare the performance of the AI model with the conventional guidelines, the test set was evaluated according to the Japanese Society for Cancer of the Colon and Rectum (JSCCR) and National Comprehensive Cancer Network (NCCN) guidelines.According to the guidelines, patients who have any one of the risk factors in pathological examination are classified as a high-risk group for LNM.Table 1 shows the criteria for screening high-risk LNM patients in the guidelines.The performance of the AI model was measured by the area under the ROC curve (AUC), sensitivity, specificity, balanced accuracy, positive predictive value (PPV), negative predictive value (NPV), and decision curve analysis (DCA) in the test set and compared with the performance of the JSCCR and NCCN guidelines.

Statistical analysis
The ROC curve and decision curve were plotted using Python (version 3.8).All statistical analyses were performed using R software (version 4.3.0),and p <0.05 was considered statistically significant.Chisquare test and wilcoxon test were used to determine the significance of differences between groups for dichotomous and continuous variables.Confidence intervals (CIs) were calculated using the binom test.

Baseline characteristics of the patients
Table 2 shows the baseline characteristics of the population participating in this study.A total of 651 patients were enrolled in this study, comprising 546 cases (83.7%) as the training cohort and 105 cases (16.3%) as the test set for model development and validation.Among the 221 patients who underwent additional radical surgery after ER, 219 underwent additional radical surgery within one month, and two patients underwent additional radical surgery at 10-12 months of follow-up due to suspected LNM.There was no significant difference in all of the factors and in the rate of LNM in total (10.8% vs. 7.6%, p=0.325) or subgroups (ER alone group 0% vs. 0%, p>0.999;ER +surgery group 8.6% vs. 2.8%, p=0.242; initial surgery group 14.7% vs. 11.4%,p=0.508) between the training and testing sets.The average number of lymph nodes per patient in the training and test sets was 11 ± 5 (median, 10) and 12 ± 5 (median, 12), respectively.Figure 3 shows the importance of each factor used in the development of the AI model.Age and tumor size had the most important effects on the ability to predict LNM among the examined factors.

Discussion
Effectively screening for patients who are at risk for LNM after ER is the key to ensuring a good prognosis and reducing overtreatment.In this study, we developed a prediction model for LNM in patients with T1 CRC using a decision tree algorithm; the model achieved excellent predictive performance, outperforming the JSCCR and NCCN guidelines.
At present, LV, high-grade histology, DSI, and tumor budding are recognized as risk factors for LNM (1-3).However, each factor has a different weight on the correlation with LNM, and their different combinations have different correlations with LNM.For  previous study with a small cohort showed that in patients with T1 CRCs treated by ER, the risk of LNM can be divided into low, moderate, and high grades based on different combinations of LV, histological differentiation, and other risk factors; the rates of LNM in these groups were 0.8%, 25.0-28.8%,and 66.6%, respectively (8).
Recently, predictive models using AI technology have gradually been developed to predict medical outcomes, and those models   Decision curve analysis for the artificial intelligence model.The black dashed line the assumption that no patient with lymph node metastasis, and the solid line represents all patients with lymph node metastasis.high-risk group.The specificity of the AI model was significantly higher than that of JSCCR (85.8% vs. 17.5%, p<0.001), and according to the AI model, the proportion of high-risk patients was reduced from 83.8% to 20.9%.Although the specificity of NCCN also reaches 82.4%, 1.2% of patients will be missed in the diagnosis according to NCCN guidelines.Predictive models for LNM in T1 CRCs have been developed in a small number of studies using machine learning algorithms, and all of them showed better predictive performance than the guidelines.It was reported that an AI model using the least absolute shrinkage and selection operator (LASSO) method (AUC=0.76-0.83)outperformed the American Society for Gastrointestinal Endoscopy (ASGE)/European Society for Gastrointestinal Endoscopy (ESGE) guidelines (AUC=0.67)and the JSCCR (AUC=0.518-0.65)guidelines (6,25).Ichimasa et al. ( 7) developed an AI model using a support vector machine (SVM), and the model achieved significantly higher specificity (66% for the model vs. 0% -44% for the guidelines) and accuracy (69% for the model vs. 9% -49% for the guidelines) than the NCCN, European Society for Medical Oncology (ESMO), and JSCCR guidelines.The ANN model (AUC=0.83)developed by Kudo et al. (23) outperformed the US (AUC=0.73)and Japanese (AUC=0.57)guidelines.In this study, we used the decision tree algorithm (AUC=0.960),which showed better performance than the other above models.Ahn et al. (21) compared the performance of T1 CRC LNM prediction models developed using five types of machine learning algorithms and found that the random forest (AUC=0.991)and CART (AUC=0.944)models had better performance than other models; both models were similar to ours in performance.
This study has several limitations.First, approximately 10% of patients in this study did not receive additional surgery because there were no risk factors detected after ER, and their LNM status was assessed only by endoscopic and imaging examinations, although most of these patients were followed up for more than 5 years.Second, although our AI model exhibits excellent performance in the test set, sufficient external validation is required to further confirm its performance before it can be applied in clinical practice.
In conclusion, our AI model has better performance than conventional guidelines for predicting LNM in T1 CRCs and can effectively reduce unnecessary additional surgery.We propose that AI prediction models be routinely applied to inform treatment decisions in patients who have undergone ER of T1 CRCs.

FIGURE 1 Flowchart
FIGURE 1Flowchart of patients included in this study for training and test set, and design of this study.

FIGURE 2
FIGURE 2Measurement method of depth and width of submucosal invasion in this study.The width and the depth of submucosal invasion are measured in the widest and deepest parts of the submucosal infiltration, respectively.(A) Pedunculated.(B) Sessile.

Figure 4
Figure4shows the ROC curves of the AI model and guidelines on the test set.As quantified by the AUC, the AI model had better example, LV has been shown to be an independent risk factor for LNM in most studies, while DSI often shows a very weak correlation(8,(15)(16)(17).Different patients often have different types and combinations of risk factors; therefore, these patients have different risks of LNM.In addition, scholars have reported in some studies that the gross type(18, 19), location (20), and size of the tumor (21) and the age (21) and sex (11, 22) of the patient are also related to LNM; however, guidelines are unable to comprehensively reflect the role of these potential influencing factors.In a few studies, scholars have attempted to grade the risk of LNM based on a combination of different risk factors (8-11).Miyachi et al. (11) attempted to use 5 factors to grade LNM risk, and 64% of patients were classified in the high to ultrahigh group.Our

FIGURE 4
FIGURE 4Receiver operating characteristic curves of the artificial intelligence model, Japanese Society for Cancer of the Colon and Rectum (JSCCR), and National Comprehensive Cancer Network (NCCN) guidelines.

FIGURE 3 Factor
FIGURE 3Factor importance of the developed model.

TABLE 1
Indications for additional treatment after endoscopic resection of T1 colorectal cancer.
LV, lymphovascular invasion; DSI, depth of submucosal invasion; JSCCR, Japanese Society for Cancer of the Colon and Rectum; NCCN, National Comprehensive Cancer Network.balanced accuracy (92.9% vs. 58.7%,p=0.001), and PPV (36.3% vs. 9.0%, p=0.001) of the AI model were significantly better than those of the JSCCR guidelines.According to the AI model, the proportion of patients deemed to be at high risk for LNM was reduced from 83.8% (JSCCR) to 20.9%.The specificity (85.8% vs. 82.4%,p=0.557) and sensitivity (100% vs. 87.5%,p=0.301) of the AI model were higher than those of the NCCN guidelines, but according to the NCCN guidelines, 1.2% of patients would have a missed diagnosis.

TABLE 2
Baseline characteristics of the training and test sets.
LV, lymphovascular invasion; PDC, poorly differentiated cluster; WSI, width of submucosal invasion; DSI, depth of submucosal invasion; ASI, area of submucosal invasion; LNM, lymph node metastasis; ER, endoscopic resection; Chi-square test and wilcoxon test were used to determine the significance of differences between groups for dichotomous and continuous variables.

TABLE 3
Predicted results of the artificial intelligence model, JSCCR, and NCCN in the test set.
AI, Artificial intelligence; JSCCR, Japanese Society for Cancer of the Colon and Rectum; NCCN, National Comprehensive Cancer Network; LNM, lymph node metastasis; *High/Low risk: With or without any risk factors.