AUTHOR=Xu Jing , Xia Jue , Liu Yuan , Jiang Zhiyang , Zhao Songyun , Zhu Yanfei TITLE=Development and comparative evaluation of machine learning models for predicting lower extremity deep vein thrombosis in gastrointestinal cancer patients using multicenter longitudinal clinical data JOURNAL=Frontiers in Surgery VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/surgery/articles/10.3389/fsurg.2025.1648645 DOI=10.3389/fsurg.2025.1648645 ISSN=2296-875X ABSTRACT=BackgroundLower extremity deep vein thrombosis (DVT) represents a prevalent and formidable complication among patients with gastrointestinal malignancies, exerting a profound impact on both prognosis and quality of life. Owing to its intricate pathogenesis, the development of a precise risk prediction model is imperative for advancing clinical strategies in prevention and therapeutic intervention.MethodsThis retrospective study enrolled patients with gastrointestinal malignancies using multicenter, longitudinal clinical data obtained from three tertiary medical centers between 2020 and 2024. A total of 34 variables were extracted, encompassing demographic profiles, clinical parameters, tumor-specific characteristics, and laboratory indices. To identify independent predictors of DVT, both univariate and multivariate analyses were initially performed. Four machine learning algorithms—Extreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Machine (SVM), and k-Nearest Neighbors (KNN)—were subsequently constructed to predict DVT risk. Model performance was rigorously assessed through receiver operating characteristic (ROC) curves, calibration plots, Brier scores, and decision curve analysis (DCA). Internal validation was conducted via ten-fold cross-validation, while an independent external cohort was employed to evaluate model generalizability. To elucidate the underlying predictive mechanisms, SHapley Additive exPlanations (SHAP) analysis was carried out.ResultsThrough a combination of univariate and multivariate analyses alongside four machine learning algorithms, surgery, prolonged immobilization, central venous catheterization, radiotherapy, distant metastasis, and chemotherapy emerged as significant high-risk factors for DVT. All four predictive models exhibited robust performance, with the XGBoost model demonstrating superior discrimination, calibration, and clinical utility. Findings from the external validation cohort further substantiated its stability and generalizability. SHAP analysis illuminated the relative contributions and directional influences of pivotal variables within the predictive framework.ConclusionMachine learning models derived from multicenter, longitudinal clinical datasets offer robust predictive capabilities for assessing DVT risk in patients with gastrointestinal malignancies. These models furnish clinicians with individualized risk stratification tools, facilitating the refinement of preventive strategies and the enhancement of clinical decision-making, ultimately contributing to improved patient management.