Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol., 15 October 2025

Sec. Clinical Diabetes

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1587932

This article is part of the Research TopicWorld Diabetes Day 2024: Exploring Mechanisms, Innovations, and Holistic Approaches in Diabetes CareView all 22 articles

A machine learning model for predicting the risk of diabetic nephropathy in individuals with type 2 diabetes mellitus

Tingting Li,,Tingting Li1,2,3Jinbo ChenJinbo Chen4Xin Zhang,,Xin Zhang1,2,3Kaiwen Wang,Kaiwen Wang1,2Xuesen Zhao,Xuesen Zhao1,2Yi Cao,,Yi Cao1,2,3Zhen XuZhen Xu5Shiyue WangShiyue Wang6Peng SuPeng Su3Xiaoyan HeXiaoyan He4Yang YangYang Yang4Xiaolu CaoXiaolu Cao7Xiaohua Liang*Xiaohua Liang4*Dong Ma,,*Dong Ma1,2,3*
  • 1Department of Biochemistry and Molecular Biology, Key Laboratory of Neural and Vascular Biology, Ministry of Education, Shijiazhuang, Hebei, China
  • 2Hebei Key Laboratory of Cardiovascular Homeostasis and Aging, Hebei Medical University, Shijiazhuang, Hebei, China
  • 3School of Public Health, North China University of Science and Technology, Tangshan, China
  • 4Department of General Medicine, Shijiazhuang Second Hospital, Shijiazhuang, China
  • 5School of Medicine, Hebei University of Engineering, Handan, China
  • 6College of Public Health, Zhengzhou University, Zhengzhou, China
  • 7Diabetic Ophthalmology Department, Hebei Eye Hospital, Xingtai, China

Introduction: Diabetic kidney disease (DKD) represents the predominant form of chronic kidney disease (CKD) linked with diabetes mellitus. The application of artificial intelligence holds promise for delaying renal deterioration and decreasing treatment expenses by facilitating early detection and intervention. This is contingent upon the development of an efficient and user-friendly model for predicting DKD risk in diabetic individuals. In this study, leveraging extensive clinical datasets, we sought to develop and validate a predictive model employing machine learning techniques to assess the risk of DKD in patients with type 2 diabetes mellitus (T2DM).

Research design and methods: We conducted a retrospective collection of clinical data from 10,057 patients diagnosed with T2DM at Shijiazhuang Second Hospital. A random selection of 15% of these patients (n=1,508) was utilized for external validation. The remaining 8,549 patients were divided into a training set (n = 5,985) and a validation set (n = 2,564) using a simple random sampling method in a 7:3 ratio. Subsequently, we employed LASSO regression to identify variables significantly associated with DKD in T2DM patients. These variables were incorporated into eight distinct predictive models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), KNeighbors Classifier (KNN), Gradient Boosting Classifier (GBM), AdaBoost Classifier (AdaBoost), and Extreme Gradient Boosting (XGBoost). The models’ predictive performance was assessed using metrics such as the area under the curve (AUC), accuracy, F1 score, and Brier score. Finally, we developed an online calculator to estimate DKD risk in T2DM patients.

Results: Fifteen features—namely gender, age, systolic blood pressure (SBP), blood urea nitrogen (BUN), creatinine (Cr), BUN/Cr ratio, uric acid (UA), hemoglobin A1c (HbA1c), microalbuminuria, presence of diabetic retinopathy (DR), hypertension, coronary heart disease (CHD), history of cerebral infarction, family history of diabetes, and family history of CHD-associated with DKD were selected using LASSO regression. Among eight evaluated models, the XGBoost algorithm demonstrated superior performance on both training and validation datasets, with an AUCof 0.932 (95%CI: 0.926-0.938) and 0.930, (95%CI: 0.920-0.939), respectively. The model achieved an accuracy of 0.845 and 0.844, sensitivity of 0.834 and 0.850, specificity of 0.857 and 0.837, F1 score of 0.847 and 0.848, and a Brier score of 0.167 and 0.166, respectively. Decision curve analysis (DCA) further validated the superiority of the XGBoost model over other models across a range of clinically relevant risk thresholds, yielding the highest net benefits. Finally, an online predictive calculator for the occurrence of DKD was developed based on the XGBoost model, utilizing a cut-off value of 50.7%.

Conclusions: The developed XGBoost model demonstrated optimal predictive accuracy for the occurrence of DKD in patients with T2DM. This model facilitated the construction of an online prediction calculator, offering an accessible and practical tool for both patients and clinicians.

Introduction

Type 2 diabetes mellitus (T2DM) is the predominant form of diabetes, accounting for over 90% of diabetes cases. Diabetic kidney disease (DKD) is the most prevalent form of chronic kidney disease (CKD) associated with diabetes mellitus. In China, the prevalence of diabetes mellitus is approximately 170 million individuals (1), with 30% to 40% of these patients expected to develop DKD (2). Globally, DKD impacts 8% to 16% of the population’s health (3), and is characterized by a prolonged disease course, poor prognosis, and high treatment costs, imposing a significant burden on patients, families, and society. DKD is also a leading cause of end stage kidney disease (ESKD) (4, 5) and is now associated with a higher prevalence of cardiovascular diseases compared to other CKD patients (59.26% vs. 29.60%) (6). An international systematic review examining the prevalence and risk factors of DKD worldwide reported that the prevalence of DKD among T2DM patients ranges from 30% to 50% (7). Pan et al. (8) analyzed the burden of DKD in China from 1990 to 2019 and found that the increase in CKD cases is primarily attributed to the rising incidence of both T1DM and T2DM, with the number of prevalent T2DM cases with concomitant CKD being notably higher [57.4 (95%CI: 49.5-66.5) vs. 3,107.6 (95%CI: 2,815.2-3,390.9) million cases]. Consequently, a significant public health challenge lies in the precise and convenient prediction of high-risk diabetic kidney disease (DKD) in patients with diabetes. This early identification and intervention are anticipated to delay renal impairment and effectively reduce treatment costs.

There is a critical need for prognostic tools that are both easily interpretable and accurate, and that can be seamlessly integrated into clinical workflows. While certain blood-based biomarkers, such as plasma KIM-1 and TNF-α receptors, have shown correlation with the progression of DKD [like as plasma KIM-1 (9) and TNF-αreceptors (10)], the development of precise predictive models that incorporate patients’ electronic health records (EHR), including blood these biomarkers and other relevant factors remains limited. Machine learning, a vital component of artificial intelligence, is characterized by its ability to handle nonlinearity, complex interactions, and a greater number of variables influencing outcomes. This presents significant potential for enhancing the predictive capabilities of diseases models in clinical application. A growing body of literature indicates that several established predictive models, utilizing multifactor Logistic regression, BP neural networks, and LASSO regression, have been applied to screen risk factors for DKD complications in patients with T2DM (11, 12). However, a comparative analysis of the performance of these machine learning-based multi-predictive models remains unexplored. Consequently, this study aims to evaluate eight constructed DKD prediction models, to identify the most effective model for predicting the risk of DKD development in T2DM patients. To enhance the accessibility and utility of this model, we have developed an online calculator designed to assist clinicians in accurately stratifying risk and advising patients on the initial and progressive stages of DKD. Additionally, this tool aims to increase awareness of preventive measures in patients’ daily lives.

Research design and methods

Study participants

This retrospective study collected data from 10, 057 patients diagnosed with T2DM at the Second Hospital of Shijiazhuang City between December 2017 and December 2023. T2DM was defined according to the Guidelines for the Prevention and Treatment of T2DM in China (13) as follows: 1) T2DM was recorded in the medical billing; 2) the HbA1c level was equal to or above 6.5% (NGSP); 3) the fasting plasma glucose level was equal to or above 126 mg/dL, except in an emergency room; 4) the postprandial plasma glucose level was equal to or above 200 mg/dL, except in an emergency room; 5) anti-diabetic medication was prescribed. In addition, the age of the diabetic patients was above 18 years. The exclusion criteria were as follows: 1) presence of concurrent chronic kidney disease (CKD) unrelated to diabetes; 2) coexistence of severe systemic diseases; 3) acute metabolic disorders; 4) incomplete demographic information or relevant laboratory indicators. This research was approved by the Ethics Committee of the Second Hospital of Shijiazhuang City (ethical approval number: NO. 191128). All private personal information was protected and removed during the analysis and publication process. Due to the retrospective nature of this study, written informed consent was not required.

Definition of DKD

Focusing on one of the diabetic complications, concurrent DKD categorized all patients with T2DM into the DKD group (n = 5,162) and the non-DKD group (n = 4,895). The diagnostic criteria of DKD were as follows (14): 1) under conditions where diabetes is confirmed as the cause of renal damage as well as chronic kidney disease (CKD) was excluded; 2) albumin-to-creatinine ratio (UACR) ≥30 mg/g, urinary albumin excretion rate (UAER) ≥30 mg/24 h (or≥20 μg/min), and estimated glomerular filtration rate (eGFR) persistently < 60 ml·min-1·(1.73 m2) -1 of three tests were conducted within a period of 3 to 6 months; 3) renal biopsy results consistent with pathological changes in DKD.

Clinical data

First, we randomly selected 15% of the patients for external validation (n = 1,508) and used a simple random sampling method to divide the 8,549 patients into a training set (n = 5,985) and validation set (n = 2,564) in a ratio of 7:3. Clinical data of patients with T2DM collected through review of medical records were involved in four parts: 1) general information: gender, age, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), smoking history, alcohol consumption history, history of coronary heart disease, history of cerebral infarction, family history of hypertension, family history of diabetes, family history of coronary heart disease (CHD); 2) laboratory examination indicators: Triglycerides (TG), total cholesterol (TC), high-density lipoprotein (HDL), low-density lipoprotein (LDL), fasting blood glucose (FBG), glycated hemoglobin (HbA1c), high-sensitivity C-reactive protein (hs-CRP), albumin (Alb), white blood cell count (WBC), lymphocyte count (LYM), neutrophil count (NEUT), monocyte count (MONO), platelet count (PLT), platelet distribution width (PDW), large platelet ratio (P-LCR), D-dimer, blood urea nitrogen (BUN), creatinine (Cr), BUN/Cr, glucose (GLU), Apolipoprotein-A1/Apolipoprotein-B (APOA1/APOB), direct bilirubin (DBIL),indirect bilirubin (IBIL), microalbuminuria, α1-microglobulin (α1-MG), β2-microglobulin (β2-MG), uric acid (UA), aspartate transaminase (AST), alanine transaminase (ALT); 3) comorbidity status: diabetic retinopathy (DR), presence of hypertension, CHD, cerebral infarction, hypokalemia, hyperlipidemia.

Statistical analysis

Continuous variables are presented as median (interquartile range), and categorical variables are expressed as the number of patients (%). The t-test or chi-square test was used to compare differences between the two groups. DKD occurrence in the training set was used as the dependent variable. Feature selection related to DKD was performed using least absolute shrinkage and selection operator (LASSO) regression. Based on these selected variables, eight distinct prediction models including: Logistic Regression (LR) model, Random Forest (RF) model, Support Vector Machine (SVM) model, Gaussian Naive Bayes (GNB) model, KNeighbors Classifier (KNN) model, Gradient Boosting Classifier (GBM) model, AdaBoost Classifier (AdaBoost) model, and Extreme Gradient Boosting (XGBoost) model were developed to achieve the idea predictive performance, which was further assessed by comparing the area under the receiver operating characteristic curve (AUC), accuracy, F1 score, and Brier score. Clinical utility metrics were evaluated using a decision curve analysis (DCA). After determining the best-performing model, the significant variables were visualized using xgb. plot and further interpretation of the XGBoost model using R Studio. Using the established XGBoost model, we calculated the area under the curve, accuracy, sensitivity, and specificity for predicting the occurrence of DKD in the external validation set. Lastly, the online XGBoost model via the Shiny package hosted on shinyapps.io, acting as a web-based predictor, was found to significantly drive the outcome, which conveniently and accurately estimates the risk of DKD in patients with T2DM. Statistical significance was set at p < 0.05. Analyses were performed using R version 4.4.2 and Python 3.13.2.

Results

Patient characteristics

In total, 10,057 T2DM patients were enrolled in the present study based on the inclusion and exclusion criteria (Figure 1). Table 1 shows patient characteristics according to the DKD complication accompanied by some significant differences in age, hs-CRP, IBIL, and history of cerebral infarction (all P < 0.05) observed between the training and validation sets.

Figure 1
Flowchart depicting the selection process of patients diagnosed with T2DM at the Second Hospital of Shijiazhuang City from 2017 to 2023. Initially, 13,918 patients were identified. After excluding 3,477 for incomplete demographic and laboratory information, 10,441 remained. Further exclusions of 384 patients due to age or disease criteria led to 10,057 included in the analysis. The cohort was divided into an external validation set of 1,508 and a training and validation set of 8,549, which was split into a training set of 5,985 and a validation set of 2,564 in a 7:3 ratio.

Figure 1. Flow chart of patient enrollment.

Table 1
www.frontiersin.org

Table 1. Baseline characteristics of the participants between training set and validation set.

Identification of feature variables

Through the variable assignment details shown in Supplementary Table 1, we applied LASSO regression using non-zero coefficients to further identify some strong variables to optimize the predictive model. With a 10-fold cross-validation for the optimal lambda value (lambda.1se=0.01397873), we ultimately selected 15 features relative to DKD, which included sex, age, SBP, BUN, Cr, BUN/Cr, UA, HbA1c, microalbuminuria, presence of DR, hypertension, CHD, history of cerebral infarction, family history of diabetes, and family history of CHD (Figures 2A, B).

Figure 2
Panel A depicts a line graph showing coefficients against log lambda, with multiple colored lines decreasing towards the right. Panel B displays a plot of binomial deviance against log lambda, featuring a red dotted line with error bars, forming a curve with a minimum point.

Figure 2. Identification of variables by LASSO regression. (A) Coefficient curves for the 47clinical features, (B) Selection of optimal variables through 10-fold cross-validation.

Comparison of predictive models

We separately integrated the above 15 key variables into each of the eight machine learning models to compare the predictive ability of developing DKD risk in patients with T2DM. As shown in Figure 3, in the training set, using 10-fold cross-validation for discrimination, the mean AUC for the XGBoost model was the highest (0.932 95%CI (0.926-0.938), as well as; accuracy 0.845, sensitivity 0.834, specificity 0.857, and F1 score, 0.847 (Figure 3A and Table 2). Consistently, comparison among these models in the validation set showed that the XGBoost model also presented the best performance (AUC = 0.930, 95%CI (0.920-0.939), an accuracy of 0.844, a sensitivity of 0.850, a specificity of 0.837, and an F1 score of 0.848 (Figure 3B and Table 3). The calibration plots of the eight models show that XGBoost achieved better Brier scores (0.167 in the training set and 0.166 in the validation set) than the other models (Figure 4). This suggests that the XGBoost model is optimal for predicting the DKD risk in T2DM patients.

Figure 3
Two panels show Receiver Operating Characteristic (ROC) curves comparing different models. Panel A has models with varied Area Under Curve (AUC), including Logistic Regression (0.63) and XGBoost (0.931). Panel B displays similar models with AUCs, with Logistic Regression (0.681) and XGBoost (0.93). The y-axis represents True Positive Rate and the x-axis False Positive Rate.

Figure 3. Receiver-operating characteristic curves for eight machine learning models. (A) Comparison of AUCs among the eight machine learning models in the training set, (B) Comparison of AUCs among the eight machine learning models in the validation set.

Table 2
www.frontiersin.org

Table 2. Comparison of the performance metrics for eight models in the training set.

Table 3
www.frontiersin.org

Table 3. Comparison of the performance metrics for eight models in the validation set.

Figure 4
Two calibration curve charts labeled A and B compare the predicted probabilities versus actual outcomes for various models. Each chart shows lines for different classifiers, including Logistic Regression, Random Forest, SVM, GaussianNB, KNeighborsClassifier, GradientBoostingClassifier, AdaBoostClassifier, and XGBClassifier, with respective Brier scores. A dashed line represents perfect calibration. Chart A shows Logistic Regression with a Brier score of 0.227 and B shows 0.229, among other values. Each model's calibration line varies in proximity to the perfect calibration line across the charts.

Figure 4. Calibration plots of the eight models. (A) Comparison of calibration plots among eight machine learning models in the training set and (B) comparison of calibration plots among eight machine learning models in the validation set.

Furthermore, after selecting the XGBoost model, the SHAP package was used to analyze the XGBoost model, which reflects the influence of each feature in the sample and shows the positive and negative influences (Figure 5). For the external validation dataset, data of 1,508 patients were collected to validate the performance of the established XGBoost model (AUC = 0.878, 95% CI (0.920-0.939), accuracy = 0.788, sensitivity = 0.783, specificity = 0.793, F1 score = 0.791) (Figure 6).

Figure 5
Violin plot showing SHAP values for various features impacting a model output on CHD. Features include Microalbuminuria, DR, Hypertension, Cr, BUN, and more. Colors indicate feature values from high (red) to low (blue). Horizontal axis represents SHAP impact ranging from negative (left) to positive (right).

Figure 5. SHAP analysis of XGBoost model. A visual representation of each feature in the XGBoost model shows the relationship between the importance of each feature. The color represents the value of the variable, with red representing a larger value and blue representing a smaller value.

Figure 6
Receiver Operating Characteristic (ROC) curve graph with true positive rate on the y-axis and false positive rate on the x-axis. The ROC curve has an area of 0.88, indicating good model performance. A dashed line represents random guessing.

Figure 6. External validation ROC curve.

Decision curve analysis

To further investigate the clinical application of the XGBoost model, a comparison of the DCA among the eight machine-learning models was conducted. The results still show a larger net benefit across a range of threshold probabilities in the XGBoost model (Figure 7). For application of the XGBoost model, the best cut-off for the prediction probability of the proposed model was 50.7%. If the model predicted a probability > 50.7%, the risk of developing DKD in patients with T2DM was higher (Table 2).

Figure 7
Comparison of net benefit curves for various classifiers. Panel A shows curves from models like logistic regression, random forest, and SVM. Panel B displays slightly adjusted curves for the same classifiers, with both panels featuring threshold probability on the x-axis and net benefit on the y-axis.

Figure 7. Decision curve analysis of the eight models predicting the incidence of DKD. (A) Comparison of DCA among the eight machine learning models in the training set, (B) Comparison of DCA among the eight machine learning models in the validation set.

Application of the model

Last, based on a cut-off value of 50.7% in this model, we constructed an online prediction calculator for DKD risk (https://liting3659078.shinyapps.io/myrapp/, Figure 8), by which a practice of two representative patients exhibited a good predictive effectiveness (Supplementary Figure 1). The indicators related to these two patients are shown in Supplementary Table 2.

Figure 8
Website interface for DKD Risk Prediction Model showing input fields for variables like gender, age, and medical history. A SHAP plot on the right displays the impact of features like microalbuminuria, hypertension, and family history on model output, with red indicating high feature values and blue indicating low values. A horizontal axis shows SHAP values from negative to positive, representing the feature impact.

Figure 8. Establish a website predictor for the risk of developing DKD based on the XGBoost model. The URL provided is: https://liting3659078.shinyapps.io/myrapp/.

Discussion

In China, the management of DKD in patients with T2DM faces challenges characterized by low screening rates, low awareness among patients, low treatment rates, unattainable therapeutic goals, and insufficient community-based preventive capacities. Chen et al. (15) conducted a 7-year follow-up study on 907 diabetic patients from the Taopu Community Health Service Center in Putuo district of Shanghai, revealing that by 2015, the screening rate of DKD was merely 55.1%, which is notably lower than that of diabetic neuropathy and retinopathy (77.6%). Hence, developing strategies to efficiently increase the screening rate among high-risk populations and implementing clinical prediction tools could be a solution.

The present study was the first to ensure the 15 predictive variables affecting the occurrence of DKD in patients with T2DM as follows: gender, age, SBP, BUN, Cr, BUN/Cr, UA, HbA1c, microalbuminuria, presence of DR, hypertension, CHD, history of cerebral infarction, family history of diabetes, and family history of CHD following LASSO regression analysis, which can balance optimal fitting error and adjust the quantity and magnitude of model parameters, thereby identifying those features with enhanced predictive power over the outcome variable. This process reduces the model complexity, mitigates multicollinearity, prevents overfitting, and ultimately enhances the generalizability of the model. We constructed and compared the predictive efficacy of eight machine learning models for forecasting the DKD aspect, and the XGBoost model exhibited superior predictive capabilities in both the training and validation sets, with AUC values of 0.932 and 0.930, and F1 scores of 0.847 and 0.848, respectively. Moreover, this optimal model had a larger net benefit and threshold probability, demonstrating the clinical significance of DKD management.

The 15 predictive variables related to the occurrence of DKD in patients with T2DM were ranked as follows: microalbuminuria, presence of DR, hypertension, Cr, UA, BUN/Cr, age, BUN, family history of diabetes, HbA1c, SBP, family history of CHD, sex, history of cerebral infarction, and presence of CHD. Microalbuminuria was found to have the most significant effect on the occurrence of DKD. This is likely because microalbuminuria is a crucial biomarker in the early stages of DKD. When the kidneys of diabetic patients begin to sustain damage, microalbumin begins to appear in the urine, acting as an early indicator of renal impairment. A systematic review has indicated that DR is closely associated with nephropathy. The presence of DR increases the risk of nephropathy and serves as a predictive indicator of microalbuminuria progression (16). Hypertension is a major risk factor for the progression of DKD and the occurrence of cardiovascular diseases and death, and persistent hypertension exacerbates the burden on the kidneys (1719). UA, Cr, BUN, and microalbumin are common indicators of renal function, with Cr, BUN, and UA playing essential roles in early DKD screening (20). The results of our study were similar to the results of Li et al. (21) by multifactorial logistic regression analysis, and the prevalence of DKD was significantly higher in patients with T2DM aged ≥50 years [OR = 4.011, 95%CI (3.152-5.104)], which is consistent with the results of our study. As we known that, HbA1c serves as a pivotal index for evaluating long-term glycemic control in diabetic patients, and Ali et al. (22) showed that HbA1c plays a significant role in the development of DKD, with an association between HbA1c and microalbuminuria. Microalbuminuria is a crucial early marker of diabetic nephropathy, and when renal damage begins in diabetic patients, microalbuminuria appears in the urine. Elevated HbA1c levels often correlate with increased microalbuminuria. In our study, HbA1c emerged as the most influential risk factor for DKD occurrence, likely because all participants were patients with type 2 diabetes and HbA1c was a key indicator selected by LASSO regression. In this study, sex influenced the occurrence of DKD, with males at a higher risk. Research shows that sex differences play a key role in the progression of DKD in T2DM patients, as the DKD incidence rate in males (23.2%) is higher than that in females (19.8%) (8). Logistic regression analysis revealed that a family history of diabetes was significantly associated with the development of DKD (P < 0.05) (23).

Using the XGBoost model established based on the above characteristic variables, we conducted an external validation on a dataset that was not used for training and testing. The results showed that relatively excellent AUC, F1 score, and so on were obtained. Thus, with the advent of the artificial intelligence era, a growing body of research has shown that many models have been developed to predict the occurrence and prognosis of diseases, even the early identification of high-risk populations for DKD. However, a comprehensive comparison of multi-predictive models on performance and clinical value as well as online application remains unknown. Additionally, previous studies required manual calculations with model inputs, which significantly limited their practicality. To enhance the usability of the constructed models, we designed and deployed an online prediction calculator hosted to facilitate its availability to clinicians and patients and explored one example confirming its practical application efficiency.

This study has several limitations attention as follows: 1) The information on patients’ medication use wasn’t included in this study, preventing the identification of specific drugs and their combinations’ impact on the development of DKD. 2) The data were from hospital settings excluding community-dwelling T2DM populations, which account for a large number of high-risk DKD patients.

Overall, our study provides an optimal predictive model (XGBoost model) integrated with 15 featured indicators on a dedicated website for DKD occurrence in T2DM patients. This tool can effectively support clinical decision making and patient guidance.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by The Ethics Committee of the Second Hospital of Shijiazhuang City. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

TL: Data curation, Formal Analysis, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. JC: Resources, Software, Visualization, Writing – review & editing. XZ: Resources, Software, Writing – review & editing. KW: Data curation, Formal Analysis, Writing – review & editing. XSZ: Software, Visualization, Writing – review & editing. YC: Data curation, Software, Writing – review & editing. ZX: Software, Visualization, Writing – review & editing. SW: Data curation, Writing – review & editing. PS: Data curation, Formal Analysis, Writing – review & editing. XH: Resources, Visualization, Writing – review & editing. YY: Resources, Software, Writing – review & editing. XC: Resources, Writing – review & editing. DM: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing – review & editing. XL: Conceptualization, Project administration, Resources, Supervision, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research and/or publication of this article. This work was funded by grants from the National Natural Science Foundation of China (82270508), Hebei Provincial Natural Science Foundation Joint Fund for Precision Medicine (H2025206777), Youth Fund for Director of Key Laboratory of Neuro and Vascular Biology, Ministry of Education (NV20210006), Scientific Research Program of the Department of Education of Hebei Province (QN2022164), and Shijiazhuang Science and Technology Research and Development Program (191460933).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1587932/full#supplementary-material

Supplementary Table 1 | Variable assignment.

Supplementary Table 2 | Indicators related to DKD and non-DKD patients.

Supplementary Figure 1 | DKD online predictor for running results of two patients. (a) The predicted risk of developing DKD in Case 1 was 43.08% (< 50.7%), and (b) the predicted risk of developing DKD in Case 2 was 51.90%(> 50.7%).

References

1. International Diabetes Federation. Diabetes facets and figures(2024). Available online at: https://idf.org/about-diabetes/diabetes-facts-figures/. (Accessed October 30, 2025).

Google Scholar

2. Aldemir O, Turgut F, and Gokce C. The association between methylation levels of targeted genes and albuminuria in patients with early diabetic kidney disease. Ren Fail. (2017) 39:597–601. doi: 10.1080/0886022X.2017.1358180

PubMed Abstract | Crossref Full Text | Google Scholar

3. Chen TK, Knicely DH, and Grams ME. Chronic kidney disease diagnosis and management: a review. JAMA. (2019) 322:1294–304. doi: 10.1001/jama.2019.14745

PubMed Abstract | Crossref Full Text | Google Scholar

4. Afkarian M, Sachs MC, Kestenbaum B, Hirsch IB, Tuttle KR, Himmelfarb J, et al. Kidney disease and increased mortality risk in type 2 diabetes. J Am Soc Nephrol. (2013) 24:302–8. doi: 10.1681/ASN.2012070718

PubMed Abstract | Crossref Full Text | Google Scholar

5. Jiao F, Wong C, Tang S, Fung C, Tan K, McGhee S, et al. Annual direct medical costs associated with diabetes-related complications in the event year and in subsequent years in Hong Kong. Diabetes Med. (2017) 34:1276–83. doi: 10.1111/dme.13416

PubMed Abstract | Crossref Full Text | Google Scholar

6. Major RW, Cheng MRI, Grant RA, Shantikumar S, Xu G, Oozeerally I, et al. Cardiovascular disease risk factors in chronic kidney disease:a systematic review and meta-analysis. PloS One. (2018) 13:e0192895. doi: 10.1371/journal.pone.0192895

PubMed Abstract | Crossref Full Text | Google Scholar

7. Gheith O, Farouk N, Nampoory N, Halim MA, and Al-Otaibi T. Diabetic kidney disease:world wide difference of prevalence and risk factors. J Nephropharmacol. (2016) 5:49–56.

PubMed Abstract | Google Scholar

8. Pan W, Wang ML, Xu Y, Zhang JS, Zhao MM, Wan J, et al. Analysis of disease burden and risk factors of diabetic kidney disease in China from 1990 to 2019. Chin J Nephrol. (2023) 39:576–86. doi: 10.3760/cma.j.cn441217-20221115-01129

Crossref Full Text | Google Scholar

9. Coca SG, Nadkarni GN, Huang Y, Moledina DG, Rao V, Zhang J, et al. Plasma biomarkers and kidney function decline in early and established diabetic kidney disease. J Am Soc Nephrol. (2017) 28:2786–93. doi: 10.1681/ASN.2016101101

PubMed Abstract | Crossref Full Text | Google Scholar

10. Niewczas MA, Gohda T, Skupien J, Smiles AM, Walker WH, Rosetti F, et al. Circulating TNF receptors 1 and 2 predict ESRD in type 2 diabetes. J Am Soc Nephrol. (2012) 23:507–15. doi: 10.1681/ASN.2011060627

PubMed Abstract | Crossref Full Text | Google Scholar

11. Xi CF, Wang CM, Rong GH, and Deng JH. A nomogram model that predicts the risk of diabetic nephropathy in type 2 diabetes mellitus patients: a retrospective study. Int J Endocrinol. (2021) 8:6672444. doi: 10.1155/2021/6672444

PubMed Abstract | Crossref Full Text | Google Scholar

12. Shi R, Niu ZY, Wu B, Zhang TT, Cai DJ, Sun H, et al. Nomogram for the risk of diabetic nephropathy or diabetic retinopathy among patients with type 2 diabetes mellitus based on questionnaire and biochemical indicators:a cross-sectional study. Diabetes Metab Syndr Obes. (2020) 13:1215–29. doi: 10.2147/DMSO.S244061

PubMed Abstract | Crossref Full Text | Google Scholar

13. Chinese Diabetes Society. Guideline for the prevention and treatment of type 2 diabetes mellitus in China (2020 edition). Chin J Diabetes Mellitus. (2021) 13:315–409. doi: 10.2147/DMSO.S244061

PubMed Abstract | Crossref Full Text | Google Scholar

14. The Microvascular Complications Study Group of the Chinese Diabetes Society (CDS). Clinical guideline for the prevention and treatment of diabetic kidney disease in China (2021 edition). Chin J Diabetes Mellitus. (2021) 13:762–84. doi: 10.3760/cma.j.cn121383-20210825-08064

Crossref Full Text | Google Scholar

15. Chen SY, Hou XH, Sun Y, Hu G, Zhou XY, Xue HJ, et al. A seven-year study on an integrated hospital-community diabetes management program in Chinese patients with diabetes. Prim Care Diabetes. (2018) 12:231–7. doi: 10.1016/j.pcd.2017.12.005

PubMed Abstract | Crossref Full Text | Google Scholar

16. Pearce I, Simó R, Lövestam-Adrian M, Wong DT, and Evans M. Association between diabetic eye disease and other complications of diabetes: implications for care.A systematic review. Nutrients. (2019) 11:467–78. doi: 10.1111/dom.13550

PubMed Abstract | Crossref Full Text | Google Scholar

17. Morton JI, Lazzarini PA, Polkinghorne KR, Carstensen B, Magliano DJ, Shaw JE, et al. The association of attained age, age at diagnosis, and duration of type 2 diabetes with the long-term risk for major diabetes-related complications. Diabetes Res Clin Pract. (2022) 190:110022. doi: 10.1016/j.diabres.2022.110022

PubMed Abstract | Crossref Full Text | Google Scholar

18. Emdin CA, Rahimi K, Neal B, Callender T, Perkovic V, Patel A, et al. Blood pressure lowering in type 2 diabetes: a systematic review and meta-analysis. JAMA. (2015) 313:603–15. doi: 10.1001/jama.2014.18574

PubMed Abstract | Crossref Full Text | Google Scholar

19. Bakris GL, Agarwal R, Chan JC, Cooper ME, Gansevoort RT, Haller H, et al. Effect of finerenone on albuminuria in patients with diabetic nephropathy: a randomized clinical trial. Am J Kidney Dis. (2015) 2015:31484–94. doi: 10.1001/jama.2015.10081

PubMed Abstract | Crossref Full Text | Google Scholar

20. Wu L, Chang DY, and Chen H. Early screening and evaluation of diabetic kidney disease. Chin J Gen Pract. (2022) 21:814–6. doi: 10.3760/cma.j.cn114798-20220429-00356

Crossref Full Text | Google Scholar

21. Li YL, Liao YG, Li XW, Zheng HY, Huang MW, Chen SS, et al. Risk factors of diabetic nephropathy. Prev Med. (2017) 24:133G6. doi: 10.3969/j.issn.1006-3110.2017.02.002

Crossref Full Text | Google Scholar

22. Ali F, Alsayegh F, Sharma P, Waheedi M, Bayoud T, Alrefai F, et al. White blood cell subpopulation changes and prevalence of neutropenia among Arab diabetic patients attending Dasman Diabetes Institute in Kuwait. PloS One. (2018) 13:e0193920. doi: 10.1371/journal.pone.0193920

PubMed Abstract | Crossref Full Text | Google Scholar

23. Mazin MS. Wieam Risk factors associated with the development of diabetic kidney disease in Sudanese patients with type 2 diabetes mellitus: A case-control study. Diabetes Metab Syndr. (2021) 15:102320. doi: 10.1016/j.dsx.2021.102320

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: type 2 diabetes mellitus, diabetic kidney disease, machine learning, prediction model, predictive value

Citation: Li T, Chen J, Zhang X, Wang K, Zhao X, Cao Y, Xu Z, Wang S, Su P, He X, Yang Y, Cao X, Liang X and Ma D (2025) A machine learning model for predicting the risk of diabetic nephropathy in individuals with type 2 diabetes mellitus. Front. Endocrinol. 16:1587932. doi: 10.3389/fendo.2025.1587932

Received: 05 March 2025; Accepted: 19 September 2025;
Published: 15 October 2025.

Edited by:

Åke Sjöholm, Gävle Hospital, Sweden

Reviewed by:

Chris Robert Neal, University of Bristol, United Kingdom
Roshan Kumar Mahat, Dharanidhar Medical College and Hospital, India

Copyright © 2025 Li, Chen, Zhang, Wang, Zhao, Cao, Xu, Wang, Su, He, Yang, Cao, Liang and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xiaohua Liang, MTMzNjM4Njc2NjlAMTYzLmNvbQ==; Dong Ma, bWFkb25nMTE5QGhlYm11LmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.