AUTHOR=Tian HuaKai , Ning ZhiKun , Zong Zhen , Liu Jiang , Hu CeGui , Ying HouQun , Li Hui 

TITLE=Application of Machine Learning Algorithms to Predict Lymph Node Metastasis in Early Gastric Cancer

JOURNAL=Frontiers in Medicine

VOLUME=Volume 8 - 2021

YEAR=2022

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2021.759013

DOI=10.3389/fmed.2021.759013

ISSN=2296-858X

ABSTRACT=Objective 
This study aims to establish the best early gastric cancer lymph node metastasis(LNM) prediction model through machine learning(ML) to better guide clinical diagnosis and treatment decisions.
Methods
Screening of gastric cancer patients with T1a and T1b stages from 2010 to 2015 from the database of surveillance, epidemiology and final results. At the same time, collecting the clinicopathological data of patients with early gastric cancer who were treated with surgery in the Second Affiliated Hospital of Nanchang University from January 2014 to December 2016. We applied 7 ML algorithms,including generalized linear model(GLM),RPART, Random Forest (RF), Gradient Boosting Machine (GBM), Support vector machine(SVM),Regularized Dual Averaging(RDA), Neural Network (NNET), coupled with patient pathological information to develop the best prediction model for early gastric cancer lymph node metastasis. The SEER database randomly selects 80% as the training set, 20% as the test set, and the data from the Second Affiliated Hospital as the external verification set. Use AUCROC, F1-score value, sensitivity, and specificity to evaluate the performance of the model.
Results
The results show that tumor size, tumor grade, and depth of tumor invasion are independent risk factors for early gastric cancer LNM. Comprehensive comparison of the prediction model performance of the training set and the test set, the RDA model has the best prediction performance (F1-score=0.773, AUCROC=0.742). The external validation set AUCROC=0.73.
Conclusions
Tumor size, tumor grade, and depth of tumor invasion are independent risk factors for early gastric cancer LNM.ML can predict LNM risk more accurately, and RDA model has the best predictive performance and can better guide clinical diagnosis and treatment decisions.