AUTHOR=Su Jie , Tang Yuechao , Wang Yanan , Chen Chao , Song Biao TITLE=Predicting deep vein thrombosis using machine learning and blood routine analysis JOURNAL=Frontiers in Big Data VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2025.1605258 DOI=10.3389/fdata.2025.1605258 ISSN=2624-909X ABSTRACT=ObjectiveLower limb deep vein thrombosis (DVT) is a serious health problem, causing local discomfort and hindering walking. It can lead to severe complications, including pulmonary embolism, chronic post-thrombotic syndrome, and limb amputation, posing risks of death or severe disability. This study aims to develop a diagnostic model for DVT using routine blood analysis and evaluate its effectiveness in early diagnosis.MethodsThis study retrospectively analyzed patient medical records from January 2022 to June 2023, including 658 DVT patients (case group) and 1,418 healthy subjects (control group). SHAP (SHapley Additive exPlanations) analysis was employed for feature selection to identify key blood indices significantly impacting DVT risk prediction. Based on the selected features, six machine learning models were constructed: k-Nearest Neighbors (kNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Model performance was assessed using the area under the curve (AUC).ResultsSHAP analysis identified ten key blood routine indices. The six models constructed using these indices demonstrated strong predictive performance, with AUC values exceeding 0.8, accuracy above 70%, and sensitivity and specificity over 70%. Notably, the RF model exhibited superior performance in assessing the risk of DVT.ConclusionsOur study successfully developed machine learning models for predicting DVT risk using routine blood tests. These models achieved high predictive performance, suggesting their potential for early DVT diagnosis without additional medical burden on patients. Future research will focus on further validation and refinement of these models to enhance their clinical applicability.