Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Aging

Sec. Musculoskeletal Aging

Development and Validation of a Machine Learning-Based Risk Prediction Model for Sarcopenia in Community Hospital Patients: A Retrospective Cohort Study

Provisionally accepted
Xue  ZhaoXue Zhao1Wang  YaoWang Yao2Jiawei  ShenJiawei Shen1Xinyu  TangXinyu Tang3Jue  ZhengJue Zheng1Chang  GuoChang Guo1Sun  YeSun Ye1Miqiong  LiMiqiong Li1*Chao  WangChao Wang2,4*Peihao  YinPeihao Yin5,6,7*
  • 1Department of General Practice, Changshou Community Health Service Center of Putuo District, Shanghai, China
  • 2Department of Oncology, Shanghai Jiaotong University School of Medicine Affiliated Ruijin Hospital, Shanghai, China
  • 3Department of General Practice, Changshou Community Health Service Center of Putuo District,, Shanghai, China
  • 4Shanghai Jiao Tong University, Shanghai, China
  • 5Department of General Surgery, Putuo People’s Hospital, School of Medicine, Tongji University, Shanghai, China
  • 6Department of Clinical Nutrition, Putuo People’s Hospital, School of Medicine, Tongji University, Shanghai, China
  • 7Department of Hospital Infection Control and Prevention, Putuo People’s Hospital, School of Medicine, Tongji University, Shanghai, China

The final, formatted version of the article will be published soon.

Sarcopenia, a progressive age-related loss of skeletal muscle mass and strength, represents a growing public health challenge amid global population aging. Early detection remains difficult with conventional diagnostic approaches. This study aimed to develop and validate reliable machine learning (ML) models to identify key risk factors for sarcopenia in community hospital settings. Using retrospective data from 1,650 patients at a community health center, we collected comprehensive demographic, clinical, and lifestyle variables. Twelve ML models—including Random Forest, Support Vector Machine, XGBoost, and Logistic Regression — were constructed and evaluated using 5-fold cross-validation. The CatBoost, LightGBM, and Gradient Boosting Decision Tree models demonstrated superior predictive performance, with area under the receiver operating characteristic curve (AUROC) values of 0.999, 0.996, and 0.995, respectively. SHapley Additive exPlanations (SHAP) analysis revealed that SARC_Cal_score, body mass index (BMI), and age belong to the most influential predictors, while a greater chronic disease burden was positively associated with sarcopenia risk. In conclusion, ML models show substantial potential for clinical application in identifying sarcopenia risk, thereby supporting early intervention strategies. This approach enhances detection capabilities and provides a practical tool for individualized treatment planning in community-based elderly care. Future research should integrate additional biomarkers and environmental factors to further improve model accuracy and facilitate integration into clinical workflows.

Keywords: machine learning, nomogram, risk prediction, Sarcopenia, Shap

Received: 24 Dec 2025; Accepted: 16 Feb 2026.

Copyright: © 2026 Zhao, Yao, Shen, Tang, Zheng, Guo, Ye, Li, Wang and Yin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Miqiong Li
Chao Wang
Peihao Yin

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.