Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol.

Sec. Clinical Diabetes

Volume 16 - 2025 | doi: 10.3389/fendo.2025.1587932

This article is part of the Research TopicWorld Diabetes Day 2024: Exploring Mechanisms, Innovations, and Holistic Approaches in Diabetes CareView all 21 articles

A Machine Learning Model for Predicting the Risk of Diabetic Nephropathy in Individuals with Type 2 Diabetes Mellitus

Provisionally accepted
Tingting  LITingting LI1,2*Jinbo  CHENJinbo CHEN3Xin  ZHANGXin ZHANG1,2Kaiwen  WANGKaiwen WANG1Xuesen  ZHAOXuesen ZHAO1Yi  CaoYi Cao1,2Zhen  XUZhen XU4Shiyue  WANGShiyue WANG5Peng  SUPeng SU2Xioayan  HEXioayan HE3Yang  YANGYang YANG3Dong  MADong MA1,2Xiaohua  LIANGXiaohua LIANG3
  • 1Department of Biochemistry and Molecular Biology, Key Laboratory of Neural and Vascular Biology, Hebei Medical University, Shijiazhuang, China
  • 2School of Public Health, North China University of Science and Technology, Tangshan, Hebei Province, China
  • 3Department of General Medicine, Shijiazhuang Second Hospital, Shijiazhuang, China
  • 4School of Medicine, Hebei University of Engineering, Handan, Hebei Province, China
  • 5College of Medicine, Zhengzhou University, Zhengzhou, Henan Province, China

The final, formatted version of the article will be published soon.

We conducted aretrospective collection of clinical data from 10,057 patients diagnosed with T2DM at Shijiazhuang Second Hospital. A random selection of 15% of these patients (n=1,508) was utilized for external validation. The remaining 8,549 patients were divided into a training set (n=5,985) and a validation set (n=2,564) using a simple random sampling method in a 7:3 ratio. Subsequently, we employed LASSO regression to identify variables significantly associated with DKD in T2DM patients. These variables were incorporated into eight distinct predictive models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), KNeighbors Classifier (KNN), Gradient Boosting Classifier (GBM), AdaBoost Classifier (AdaBoost), and Extreme Gradient Boosting (XGBoost). The models' predictive performance was assessed using metrics such as the area under the curve (AUC), accuracy, F1 score, and Brier score. Finally, we developed an online calculator to estimate DKD risk in T2DM patients.Fifteen features-namely gender, age, systolic blood pressure (SBP), blood urea nitrogen (BUN), creatinine (Cr), BUN/Cr ratio, uric acid (UA), hemoglobin A1c (HbA1c ) , microalbuminuria, presence of diabetic retinopathy (DR), hypertension, coronary heart disease (CHD), history of cerebral infarction, family history of diabetes, and family history of CHD-associated with DKD were selected using LASSO regression. Among eight evaluated models, the XGBoost algorithm demonstrated superior performance on both training and validation datasets, with an AUCof 0.932 (95%CI: 0.926-0.938) and 0.930, (95%CI: 0.920-0.939), respectively. The model achieved an accuracy of 0.845 and 0.844, sensitivity of 0.834 and 0.850, specificity of 0.857 and 0.837, F1 score of 0.847 and 0.848, and a Brier score of 0.167 and 0.166, respectively. Decision curve analysis (DCA) further validated the superiority of the XGBoost model over other models across a range of clinically relevant risk thresholds, yielding the highest net benefits. Finally, an online predictive calculator for the occurrence of DKD was developed based on the XGBoost model, utilizing a cut-off value of 50.7%.The developed XGBoost model demonstrated optimal predictive accuracy for the occurrence of DKD in patients with T2DM. This model facilitated the construction of an online prediction calculator, offering an accessible and practical tool for both patients and clinicians.

Keywords: type 2 diabetes mellitus, Diabetic kidney disease, machine learning, Prediction model, Predictive Value

Received: 05 Mar 2025; Accepted: 19 Sep 2025.

Copyright: © 2025 LI, CHEN, ZHANG, WANG, ZHAO, Cao, XU, WANG, SU, HE, YANG, MA and LIANG. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Tingting LI, 244517348@qq.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.