Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Oncol.

Sec. Gastrointestinal Cancers: Gastric and Esophageal Cancers

From Complex Algorithms to Clinical Practice: A Multicenter Machine Learning Model and Simplified Decision Tree for Predicting Cachexia Risk in Gastric Cancer

Provisionally accepted
Jian  ZhaoJian Zhao1Yu  DengYu Deng2Yajie  GuoYajie Guo3Yihuan  QiaoYihuan Qiao3Yaoyao  WuYaoyao Wu1Huadong  ZhaoHuadong Zhao4Tengyu  ZengTengyu Zeng1Huadong  ZhaoHuadong Zhao4Jiawei  SongJiawei Song3Beilei  HouBeilei Hou2Qianyong  YangQianyong Yang2*
  • 1The 958th Army Hospital of the Chinese People's Liberation Army, Chongqing, China
  • 2958 Hospital of the People's Liberation Army, Chongqing, China
  • 3Xijing Digestive Disease Hospital Fourth Military Medical University, Xi'an, China
  • 4Air Force Medical University Tangdu Hospital, Xi'an, China

The final, formatted version of the article will be published soon.

Background: Cachexia is a frequent, specific metabolic syndrome that severely compromises survival in gastric cancer (GC). While early diagnosis is paramount, existing screening methods are limited by complexity and suboptimal accuracy. There is an urgent need for an efficient, data-driven tool derived from routine clinical parameters. Methods: In this multicenter retrospective study, we analyzed data from three independent hospitals. Variable selection was performed using univariable and multivariable analyses. We constructed and compared multiple machine learning (ML) models to predict cachexia risk. The models' discriminative ability, calibration, and clinical net benefit were comprehensively evaluated via AUC, calibration plots, and Decision Curve Analysis (DCA). Results: The study included 1,570 GC patients (cachexia prevalence: 30.3%). Patients were divided into training (n=920), internal testing (n=350), and external validation (n=300) cohorts. Cachexia was significantly associated with poor nutritional status, elevated inflammation, and inferior overall survival (P < 0.01). The Random Forest (RF) model yielded the best performance, maintaining excellent stability across the internal test set (AUC=0.898) and external validation set (AUC=0.913). To enhance clinical utility, we further derived a simplified decision tree model based on three accessible markers: CA19-9, CEA, and albumin. This simplified tool retained high diagnostic accuracy (AUC > 0.783) and demonstrated significant positive net benefits in DCA. Conclusion: We successfully established and externally validated a high-performance ML model for predicting GC-associated cachexia. Crucially, the derived simplified decision tree offers a convenient, highly generalizeable tool for clinicians to identify high-risk patients using routine laboratory tests, enabling earlier precision nutritional management.

Keywords: Cachexia, decision tree, External validation, gastric cancer, machine learning, Nutritional assessment, Prediction model

Received: 14 Dec 2025; Accepted: 16 Feb 2026.

Copyright: © 2026 Zhao, Deng, Guo, Qiao, Wu, Zhao, Zeng, Zhao, Song, Hou and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Qianyong Yang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.