AUTHOR=Ding Li , Wang Kun , Zhang Chi , Zhang Yang , Wang Kanlirong , Li Wang , Wang Junqi TITLE=A Machine Learning Algorithm for Predicting the Risk of Developing to M1b Stage of Patients With Germ Cell Testicular Cancer JOURNAL=Frontiers in Public Health VOLUME=Volume 10 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2022.916513 DOI=10.3389/fpubh.2022.916513 ISSN=2296-2565 ABSTRACT=ABSTRACT Objective: Distant metastasis other than non-regional lymph nodes and lung(i.e. M1b stage), significantly contributes to the poor survival prognosis of germ cell testicular cancer(GCTC) patients. The aim of this study is to develop a machine learning(ML) algorithm model to predict the risk of GCTC patients developing the M1b stage, which can be used to assist in early intervention of patients. Methods: The clinical and pathological data of GCTC patients were obtained from the Surveillance, Epidemiology, and End Results(SEER) database. Combing the patient’s characteristics variables, we applied six machine learning(ML) algorithms to develop the predictive models, including logistic regression(LR), eXtreme Gradient Boosting (XGBoost), light Gradient Boosting Machine(lightGBM), Random Forest(RF), Multilayer Perceptron(MLP), and k-Nearest Neighbor(kNN). Model performances were evaluated by ten-fold cross receiver operating characteristic (ROC) curves, which calculated the area under the curve (AUC) of models for predictive accuracy. 54 patients from our own center(October 2006 to June 2021) were collected as the external validation cohort. Results: 4323 patients eligible for inclusion were screened for enrollment from the SEER database, of which 178(4.12%) developing M1b stage. Multivariate logistic regression showed that lymph node dissection(LND), T stage, N stage, lung metastases, and distant lymph node metastases were independent predictors of developing M1b stage risk. The models based on XGBoost and RF algorithms both showed stable and efficient prediction performance in the training and external validation groups. Conclusion: S stage is not an independent factor for predicting the risk of developing the M1b stage of patients with GCTC. The ML models based on XGBoost and RF algorithms both have high predictive effectiveness, and may be used to predict the risk of developing the M1b stage of patients with GCTC, which is of promising value in clinical decision making. Models still need to be tested with a larger sample of real-world data.