AUTHOR=Zhang Cheng , Zhang Yi , Yang Ya-Hui , Xu Hui , Zhang Xiao-Peng , Wu Zhi-Jun , Xie Min-Min , Feng Ying , Feng Chong , Ma Tai TITLE=Machine learning models for predicting one-year survival in patients with metastatic gastric cancer who experienced upfront radical gastrectomy JOURNAL=Frontiers in Molecular Biosciences VOLUME=Volume 9 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2022.937242 DOI=10.3389/fmolb.2022.937242 ISSN=2296-889X ABSTRACT=Tumor metastasis is a common event in patients with gastric cancer (GC) who previously received gastrectomy with curative intent. It is meaningful to employ high-volume clinical data for predicting survival of metastatic GC. We aim to establish an improved machine learning (ML) classifier for predicting if a patient with metastatic GC would die within 12 months. Eligible patients were enrolled from a Chinese GC cohort, and the complete detailed information from medical records were extracted to generate a high dimensional dataset. Appropriate feature engineering and feature filter was conducted before modelling with 8 algorithms. A 10-fold cross validation (CV) nested in a holdout CV (8:2) was employed for hyperparameter tuning and model evaluation. Model selection was based on the area under receiver operating characteristics curve (AUROC), recall and precision. The selected model was globally explained by interpretable surrogate models. Of the total 399 cases (median survival of 8.2 months), 242 patients survived less than 12 months. The linear discriminant analysis (LDA), support vector machine (SVM), and random forest (RF) model had the highest AUROC (0.78 ± 0.021), recall (0.93 ± 0.031), and precision (0.80 ± 0.026), respectively. The LDA model created a new function that generally separated the 2 classes. The predicted probability of the SVM model was interpreted by a linear regression model visualized by a nomogram. The predicted class of the RF model was explained by a decision tree model. In summary, analyzing a high-volume medical data by ML is helpful to produce an improved model for predicting survival in patients with metastatic GC. The algorithm should be carefully selected in different practical scenarios.