Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Surg.

Sec. Vascular Surgery

Volume 12 - 2025 | doi: 10.3389/fsurg.2025.1648645

Development and Comparative Evaluation of Machine Learning Models for Predicting Lower Extremity Deep Vein Thrombosis in Gastrointestinal Cancer Patients Using Multicenter Longitudinal Clinical Data

Provisionally accepted
Jing  XuJing Xu1Jue  XiaJue Xia1Yuan  LiuYuan Liu2,3Zhiyang  JiangZhiyang Jiang2Songyun  ZhaoSongyun Zhao4,5*Yanfei  ZhuYanfei Zhu2*
  • 1Department of Ultrasound Medicine, Wuxi People's Hospital, Wuxi, China
  • 2Department of General Surgery, Wuxi People's Hospital Affiliated to Nanjing Medical University, Wuxi, China
  • 3Department of General Surgery, Tengzhou Central People's Hospital, Shandong, China
  • 4Department of Plastic Surgery, The Affiliated Friendship Plastic Surgery Hospital of Nanjing Medical University, Nanjing, China
  • 5Department of Plastic Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China

The final, formatted version of the article will be published soon.

Background: Lower extremity deep vein thrombosis (DVT) represents a prevalent and formidable complication among patients with gastrointestinal malignancies, exerting a profound impact on both prognosis and quality of life. Owing to its intricate pathogenesis, the development of a precise risk prediction model is imperative for advancing clinical strategies in prevention and therapeutic intervention. Methods: This retrospective study enrolled patients with gastrointestinal malignancies using multicenter, longitudinal clinical data obtained from three tertiary medical centers between 2020 and 2024. A total of 34 variables were extracted, encompassing demographic profiles, clinical parameters, tumor-specific characteristics, and laboratory indices. To identify independent predictors of DVT, both univariate and multivariate analyses were initially performed. Four machine learning algorithms—Extreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Machine (SVM), and k-Nearest Neighbors (KNN)—were subsequently constructed to predict DVT risk. Model performance was rigorously assessed through receiver operating characteristic (ROC) curves, calibration plots, Brier scores, and decision curve analysis (DCA). Internal validation was conducted via ten-fold cross-validation, while an independent external cohort was employed to evaluate model generalizability. To elucidate the underlying predictive mechanisms, SHapley Additive exPlanations (SHAP) analysis was carried out. Results: Through a combination of univariate and multivariate analyses alongside four machine learning algorithms, surgery, prolonged immobilization, central venous catheterization, radiotherapy, distant metastasis, and chemotherapy emerged as significant high-risk factors for DVT. All four predictive models exhibited robust performance, with the XGBoost model demonstrating superior discrimination, calibration, and clinical utility. Findings from the external validation cohort further substantiated its stability and generalizability. SHAP analysis illuminated the relative contributions and directional influences of pivotal variables within the predictive framework. Conclusion: Machine learning models derived from multicenter, longitudinal clinical datasets offer robust predictive capabilities for assessing DVT risk in patients with gastrointestinal malignancies. These models furnish clinicians with individualized risk stratification tools, facilitating the refinement of preventive strategies and the enhancement of clinical decision-making, ultimately contributing to improved patient management.

Keywords: deep vein thrombosis, Gastrointestinal neoplasm, machine learning, XGBoost, risk factor

Received: 17 Jun 2025; Accepted: 24 Sep 2025.

Copyright: © 2025 Xu, Xia, Liu, Jiang, Zhao and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Songyun Zhao, 2021122183@stu.njmu.edu.cn
Yanfei Zhu, wxsrmyy@outlook.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.