Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Cardiovasc. Med.

Sec. Thrombosis and Haemostasis

This article is part of the Research TopicInnovative Modeling and Simulation in Thrombosis and Hemostasis: Enhancing Diagnosis and TreatmentView all 8 articles

Application of machine learning to predict the occurrence of venous thromboembolism in patients hospitalized for coronary artery disease: a single-center retrospective study

Provisionally accepted
Yuan-Jiao  YangYuan-Jiao YangHan-Bing  YanHan-Bing YanWen-Tao  LiuWen-Tao LiuZhi-Chao  YangZhi-Chao YangXiao-Hui  WangXiao-Hui WangChen  LiuChen LiuYa-Nan  ZhangYa-Nan ZhangJun  WangJun WangJin-Peng  YaoJin-Peng YaoHui  HeHui He*
  • Benxi Central Hospital, Benxi, China

The final, formatted version of the article will be published soon.

Background: This study aimed to construct a prediction model for the occurrence of venous thromboembolism (VTE) in patients hospitalized with coronary heart disease (CHD) using machine learning algorithms. Methods: Clinical data were from the medical records of CHD patients admitted to tertiary hospitals in eastern Liaoning Province between 2019 and 2024. Five machine learning algorithms—random forest (RF), classification and regression tree (CART), logistic regression (LR), logistic regression + least absolute shrinkage and selection operator (LR+LASSO), and extreme gradient boosting (XGBoost)—were used to construct predictive models. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy were comparison metrics between different models. Results: A total of 3113 CHD inpatients were included in the study. In the internal validation set, XGBoost had the highest AUC (0.704), sensitivity (0.708), and accuracy (0.692), and RF had the highest specificity (0.706). In the time external validation set, LR+LASSO had the highest AUC (0.649), the highest specificity (0.683) for RF, and the highest sensitivity (0.682) and accuracy (0.656) for XGBoost. D-dimer, Age, and Neutrophil Count (NEUT) were the three most important relevant indicators. Conclusion: The prediction model based on machine learning algorithms for the occurrence of VTE in CHD inpatients has a specific diagnostic value. The prediction model constructed by LR+LASSO and XGBoost is more effective than the models constructed by other methods. The results of this study can provide research ideas for the clinical prevention and treatment of VTE events occurring in CHD inpatients.

Keywords: coronary heart disease, Venous Thromboembolism, machine learning, Prediction models, Risk factors

Received: 14 Apr 2025; Accepted: 17 Nov 2025.

Copyright: © 2025 Yang, Yan, Liu, Yang, Wang, Liu, Zhang, Wang, Yao and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Hui He, lnbxwxf@yeah.net

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.