ORIGINAL RESEARCH article
Front. Endocrinol.
Sec. Clinical Diabetes
Volume 16 - 2025 | doi: 10.3389/fendo.2025.1587932
This article is part of the Research TopicWorld Diabetes Day 2024: Exploring Mechanisms, Innovations, and Holistic Approaches in Diabetes CareView all 21 articles
A Machine Learning Model for Predicting the Risk of Diabetic Nephropathy in Individuals with Type 2 Diabetes Mellitus
Provisionally accepted- 1Department of Biochemistry and Molecular Biology, Key Laboratory of Neural and Vascular Biology, Hebei Medical University, Shijiazhuang, China
- 2School of Public Health, North China University of Science and Technology, Tangshan, Hebei Province, China
- 3Department of General Medicine, Shijiazhuang Second Hospital, Shijiazhuang, China
- 4School of Medicine, Hebei University of Engineering, Handan, Hebei Province, China
- 5College of Medicine, Zhengzhou University, Zhengzhou, Henan Province, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
We conducted aretrospective collection of clinical data from 10,057 patients diagnosed with T2DM at Shijiazhuang Second Hospital. A random selection of 15% of these patients (n=1,508) was utilized for external validation. The remaining 8,549 patients were divided into a training set (n=5,985) and a validation set (n=2,564) using a simple random sampling method in a 7:3 ratio. Subsequently, we employed LASSO regression to identify variables significantly associated with DKD in T2DM patients. These variables were incorporated into eight distinct predictive models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), KNeighbors Classifier (KNN), Gradient Boosting Classifier (GBM), AdaBoost Classifier (AdaBoost), and Extreme Gradient Boosting (XGBoost). The models' predictive performance was assessed using metrics such as the area under the curve (AUC), accuracy, F1 score, and Brier score. Finally, we developed an online calculator to estimate DKD risk in T2DM patients.Fifteen features-namely gender, age, systolic blood pressure (SBP), blood urea nitrogen (BUN), creatinine (Cr), BUN/Cr ratio, uric acid (UA), hemoglobin A1c (HbA1c ) , microalbuminuria, presence of diabetic retinopathy (DR), hypertension, coronary heart disease (CHD), history of cerebral infarction, family history of diabetes, and family history of CHD-associated with DKD were selected using LASSO regression. Among eight evaluated models, the XGBoost algorithm demonstrated superior performance on both training and validation datasets, with an AUCof 0.932 (95%CI: 0.926-0.938) and 0.930, (95%CI: 0.920-0.939), respectively. The model achieved an accuracy of 0.845 and 0.844, sensitivity of 0.834 and 0.850, specificity of 0.857 and 0.837, F1 score of 0.847 and 0.848, and a Brier score of 0.167 and 0.166, respectively. Decision curve analysis (DCA) further validated the superiority of the XGBoost model over other models across a range of clinically relevant risk thresholds, yielding the highest net benefits. Finally, an online predictive calculator for the occurrence of DKD was developed based on the XGBoost model, utilizing a cut-off value of 50.7%.The developed XGBoost model demonstrated optimal predictive accuracy for the occurrence of DKD in patients with T2DM. This model facilitated the construction of an online prediction calculator, offering an accessible and practical tool for both patients and clinicians.
Keywords: type 2 diabetes mellitus, Diabetic kidney disease, machine learning, Prediction model, Predictive Value
Received: 05 Mar 2025; Accepted: 19 Sep 2025.
Copyright: © 2025 LI, CHEN, ZHANG, WANG, ZHAO, Cao, XU, WANG, SU, HE, YANG, MA and LIANG. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Tingting LI, 244517348@qq.com
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.