AUTHOR=Tian HuaKai , Ning ZhiKun , Zong Zhen , Liu Jiang , Hu CeGui , Ying HouQun , Li Hui TITLE=Application of Machine Learning Algorithms to Predict Lymph Node Metastasis in Early Gastric Cancer JOURNAL=Frontiers in Medicine VOLUME=Volume 8 - 2021 YEAR=2022 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2021.759013 DOI=10.3389/fmed.2021.759013 ISSN=2296-858X ABSTRACT=Objective This study aims to establish the best early gastric cancer lymph node metastasis(LNM) prediction model through machine learning(ML) to better guide clinical diagnosis and treatment decisions. Methods Screening of gastric cancer patients with T1a and T1b stages from 2010 to 2015 from the database of surveillance, epidemiology and final results. At the same time, collecting the clinicopathological data of patients with early gastric cancer who were treated with surgery in the Second Affiliated Hospital of Nanchang University from January 2014 to December 2016. We applied 7 ML algorithms,including generalized linear model(GLM),RPART, Random Forest (RF), Gradient Boosting Machine (GBM), Support vector machine(SVM),Regularized Dual Averaging(RDA), Neural Network (NNET), coupled with patient pathological information to develop the best prediction model for early gastric cancer lymph node metastasis. The SEER database randomly selects 80% as the training set, 20% as the test set, and the data from the Second Affiliated Hospital as the external verification set. Use AUCROC, F1-score value, sensitivity, and specificity to evaluate the performance of the model. Results The results show that tumor size, tumor grade, and depth of tumor invasion are independent risk factors for early gastric cancer LNM. Comprehensive comparison of the prediction model performance of the training set and the test set, the RDA model has the best prediction performance (F1-score=0.773, AUCROC=0.742). The external validation set AUCROC=0.73. Conclusions Tumor size, tumor grade, and depth of tumor invasion are independent risk factors for early gastric cancer LNM.ML can predict LNM risk more accurately, and RDA model has the best predictive performance and can better guide clinical diagnosis and treatment decisions.