ORIGINAL RESEARCH article
Front. Pharmacol.
Sec. Drugs Outcomes Research and Policies
Volume 16 - 2025 | doi: 10.3389/fphar.2025.1691271
This article is part of the Research TopicPharmacist and patient safety: Focus on drug safetyView all 7 articles
A Study on a Real-World Data-Based VTE Risk Prediction Model for Lymphoma Patients
Provisionally accepted- 1School of Medicine, University of Electronic Science and Technology of China, chengdu, China
- 2Traditional Chinese Medicine Hospital of Xinjiang Uyghur Autonomous Region, Urumqi, China
- 3Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Patients diagnosed with malignant tumors exhibit a markedly elevated risk of venous thromboembolism (VTE), which has a negative impact on their prognosis. Currently, there is no reliable predictive model specifically for thrombosis risk in lymphoma patients. This study aims to develop and validate a machine learning model leveraging real-world data, offering a dependable risk assessment tool for the early identification of VTE in lymphoma patients. Methods: We retrospectively analyzed 605 hospitalized patients with lymphoma between January 2019 and June 2024. Candidate predictors included demographic characteristics, comorbidities and medical history, tumor-related factors, treatment-related factors, and laboratory parameters. The primary endpoint was the occurrence of VTE within six months after hospitalization for confirmed lymphoma. Model development incorporated 3 imputation methods, 3 sampling strategies, 3 feature selection approaches, and 9 machine learning algorithms. Predictive performance was compared across all models. Results: Combining different imputation, sampling, and feature selection strategies yielded 27 datasets, which were trained across 9 algorithms to generate 243 models. The optimal model—Simp-SMOTE_rf_GBM, constructed using random forest imputation, SMOTE oversampling, and gradient boosting machine—achieved the highest predictive performance (AUC=0.954). SHAP-based model interpretation identified 9 key predictors ranked by importance: anticoagulant use, D-dimer, lactate dehydrogenase, central venous catheterization, carcinoembryonic antigen (CEA), Eastern Cooperative Oncology Group (ECOG) score, serum total protein (TP), total cholesterol (TC), and infectious disease. Conclusion: This study established and validated a machine learning model for predicting VTE risk in lymphoma patients, with the optimal model demonstrating excellent discriminatory ability (AUC=0.954). The model provides evidence to guide the timing and strategy of anticoagulation, supporting early VTE screening and risk stratification in clinical practice. Its implementation has important implications for improving patient outcomes and advancing public health.
Keywords: Lymphoma, Venous Thromboembolism, machine learning, Predictive factors, predictive model
Received: 23 Aug 2025; Accepted: 03 Oct 2025.
Copyright: © 2025 He, Wang, Zhang, Li, Kang, Cai, Han, Yin, Li, Song and Yuan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Gang Li, ligang7498@126.com
Xuewu Song, xue_wu_song@163.com
Bian Yuan, bianyuan567@126.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.