- 1Department of Biochemistry and Molecular Biology, Key Laboratory of Neural and Vascular Biology, Ministry of Education, Shijiazhuang, Hebei, China
- 2Hebei Key Laboratory of Cardiovascular Homeostasis and Aging, Hebei Medical University, Shijiazhuang, Hebei, China
- 3School of Public Health, North China University of Science and Technology, Tangshan, China
- 4Department of General Medicine, Shijiazhuang Second Hospital, Shijiazhuang, China
- 5School of Medicine, Hebei University of Engineering, Handan, China
- 6College of Public Health, Zhengzhou University, Zhengzhou, China
- 7Diabetic Ophthalmology Department, Hebei Eye Hospital, Xingtai, China
Introduction: Diabetic kidney disease (DKD) represents the predominant form of chronic kidney disease (CKD) linked with diabetes mellitus. The application of artificial intelligence holds promise for delaying renal deterioration and decreasing treatment expenses by facilitating early detection and intervention. This is contingent upon the development of an efficient and user-friendly model for predicting DKD risk in diabetic individuals. In this study, leveraging extensive clinical datasets, we sought to develop and validate a predictive model employing machine learning techniques to assess the risk of DKD in patients with type 2 diabetes mellitus (T2DM).
Research design and methods: We conducted a retrospective collection of clinical data from 10,057 patients diagnosed with T2DM at Shijiazhuang Second Hospital. A random selection of 15% of these patients (n=1,508) was utilized for external validation. The remaining 8,549 patients were divided into a training set (n = 5,985) and a validation set (n = 2,564) using a simple random sampling method in a 7:3 ratio. Subsequently, we employed LASSO regression to identify variables significantly associated with DKD in T2DM patients. These variables were incorporated into eight distinct predictive models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), KNeighbors Classifier (KNN), Gradient Boosting Classifier (GBM), AdaBoost Classifier (AdaBoost), and Extreme Gradient Boosting (XGBoost). The models’ predictive performance was assessed using metrics such as the area under the curve (AUC), accuracy, F1 score, and Brier score. Finally, we developed an online calculator to estimate DKD risk in T2DM patients.
Results: Fifteen features—namely gender, age, systolic blood pressure (SBP), blood urea nitrogen (BUN), creatinine (Cr), BUN/Cr ratio, uric acid (UA), hemoglobin A1c (HbA1c), microalbuminuria, presence of diabetic retinopathy (DR), hypertension, coronary heart disease (CHD), history of cerebral infarction, family history of diabetes, and family history of CHD-associated with DKD were selected using LASSO regression. Among eight evaluated models, the XGBoost algorithm demonstrated superior performance on both training and validation datasets, with an AUCof 0.932 (95%CI: 0.926-0.938) and 0.930, (95%CI: 0.920-0.939), respectively. The model achieved an accuracy of 0.845 and 0.844, sensitivity of 0.834 and 0.850, specificity of 0.857 and 0.837, F1 score of 0.847 and 0.848, and a Brier score of 0.167 and 0.166, respectively. Decision curve analysis (DCA) further validated the superiority of the XGBoost model over other models across a range of clinically relevant risk thresholds, yielding the highest net benefits. Finally, an online predictive calculator for the occurrence of DKD was developed based on the XGBoost model, utilizing a cut-off value of 50.7%.
Conclusions: The developed XGBoost model demonstrated optimal predictive accuracy for the occurrence of DKD in patients with T2DM. This model facilitated the construction of an online prediction calculator, offering an accessible and practical tool for both patients and clinicians.
Introduction
Type 2 diabetes mellitus (T2DM) is the predominant form of diabetes, accounting for over 90% of diabetes cases. Diabetic kidney disease (DKD) is the most prevalent form of chronic kidney disease (CKD) associated with diabetes mellitus. In China, the prevalence of diabetes mellitus is approximately 170 million individuals (1), with 30% to 40% of these patients expected to develop DKD (2). Globally, DKD impacts 8% to 16% of the population’s health (3), and is characterized by a prolonged disease course, poor prognosis, and high treatment costs, imposing a significant burden on patients, families, and society. DKD is also a leading cause of end stage kidney disease (ESKD) (4, 5) and is now associated with a higher prevalence of cardiovascular diseases compared to other CKD patients (59.26% vs. 29.60%) (6). An international systematic review examining the prevalence and risk factors of DKD worldwide reported that the prevalence of DKD among T2DM patients ranges from 30% to 50% (7). Pan et al. (8) analyzed the burden of DKD in China from 1990 to 2019 and found that the increase in CKD cases is primarily attributed to the rising incidence of both T1DM and T2DM, with the number of prevalent T2DM cases with concomitant CKD being notably higher [57.4 (95%CI: 49.5-66.5) vs. 3,107.6 (95%CI: 2,815.2-3,390.9) million cases]. Consequently, a significant public health challenge lies in the precise and convenient prediction of high-risk diabetic kidney disease (DKD) in patients with diabetes. This early identification and intervention are anticipated to delay renal impairment and effectively reduce treatment costs.
There is a critical need for prognostic tools that are both easily interpretable and accurate, and that can be seamlessly integrated into clinical workflows. While certain blood-based biomarkers, such as plasma KIM-1 and TNF-α receptors, have shown correlation with the progression of DKD [like as plasma KIM-1 (9) and TNF-αreceptors (10)], the development of precise predictive models that incorporate patients’ electronic health records (EHR), including blood these biomarkers and other relevant factors remains limited. Machine learning, a vital component of artificial intelligence, is characterized by its ability to handle nonlinearity, complex interactions, and a greater number of variables influencing outcomes. This presents significant potential for enhancing the predictive capabilities of diseases models in clinical application. A growing body of literature indicates that several established predictive models, utilizing multifactor Logistic regression, BP neural networks, and LASSO regression, have been applied to screen risk factors for DKD complications in patients with T2DM (11, 12). However, a comparative analysis of the performance of these machine learning-based multi-predictive models remains unexplored. Consequently, this study aims to evaluate eight constructed DKD prediction models, to identify the most effective model for predicting the risk of DKD development in T2DM patients. To enhance the accessibility and utility of this model, we have developed an online calculator designed to assist clinicians in accurately stratifying risk and advising patients on the initial and progressive stages of DKD. Additionally, this tool aims to increase awareness of preventive measures in patients’ daily lives.
Research design and methods
Study participants
This retrospective study collected data from 10, 057 patients diagnosed with T2DM at the Second Hospital of Shijiazhuang City between December 2017 and December 2023. T2DM was defined according to the Guidelines for the Prevention and Treatment of T2DM in China (13) as follows: 1) T2DM was recorded in the medical billing; 2) the HbA1c level was equal to or above 6.5% (NGSP); 3) the fasting plasma glucose level was equal to or above 126 mg/dL, except in an emergency room; 4) the postprandial plasma glucose level was equal to or above 200 mg/dL, except in an emergency room; 5) anti-diabetic medication was prescribed. In addition, the age of the diabetic patients was above 18 years. The exclusion criteria were as follows: 1) presence of concurrent chronic kidney disease (CKD) unrelated to diabetes; 2) coexistence of severe systemic diseases; 3) acute metabolic disorders; 4) incomplete demographic information or relevant laboratory indicators. This research was approved by the Ethics Committee of the Second Hospital of Shijiazhuang City (ethical approval number: NO. 191128). All private personal information was protected and removed during the analysis and publication process. Due to the retrospective nature of this study, written informed consent was not required.
Definition of DKD
Focusing on one of the diabetic complications, concurrent DKD categorized all patients with T2DM into the DKD group (n = 5,162) and the non-DKD group (n = 4,895). The diagnostic criteria of DKD were as follows (14): 1) under conditions where diabetes is confirmed as the cause of renal damage as well as chronic kidney disease (CKD) was excluded; 2) albumin-to-creatinine ratio (UACR) ≥30 mg/g, urinary albumin excretion rate (UAER) ≥30 mg/24 h (or≥20 μg/min), and estimated glomerular filtration rate (eGFR) persistently < 60 ml·min-1·(1.73 m2) -1 of three tests were conducted within a period of 3 to 6 months; 3) renal biopsy results consistent with pathological changes in DKD.
Clinical data
First, we randomly selected 15% of the patients for external validation (n = 1,508) and used a simple random sampling method to divide the 8,549 patients into a training set (n = 5,985) and validation set (n = 2,564) in a ratio of 7:3. Clinical data of patients with T2DM collected through review of medical records were involved in four parts: 1) general information: gender, age, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), smoking history, alcohol consumption history, history of coronary heart disease, history of cerebral infarction, family history of hypertension, family history of diabetes, family history of coronary heart disease (CHD); 2) laboratory examination indicators: Triglycerides (TG), total cholesterol (TC), high-density lipoprotein (HDL), low-density lipoprotein (LDL), fasting blood glucose (FBG), glycated hemoglobin (HbA1c), high-sensitivity C-reactive protein (hs-CRP), albumin (Alb), white blood cell count (WBC), lymphocyte count (LYM), neutrophil count (NEUT), monocyte count (MONO), platelet count (PLT), platelet distribution width (PDW), large platelet ratio (P-LCR), D-dimer, blood urea nitrogen (BUN), creatinine (Cr), BUN/Cr, glucose (GLU), Apolipoprotein-A1/Apolipoprotein-B (APOA1/APOB), direct bilirubin (DBIL),indirect bilirubin (IBIL), microalbuminuria, α1-microglobulin (α1-MG), β2-microglobulin (β2-MG), uric acid (UA), aspartate transaminase (AST), alanine transaminase (ALT); 3) comorbidity status: diabetic retinopathy (DR), presence of hypertension, CHD, cerebral infarction, hypokalemia, hyperlipidemia.
Statistical analysis
Continuous variables are presented as median (interquartile range), and categorical variables are expressed as the number of patients (%). The t-test or chi-square test was used to compare differences between the two groups. DKD occurrence in the training set was used as the dependent variable. Feature selection related to DKD was performed using least absolute shrinkage and selection operator (LASSO) regression. Based on these selected variables, eight distinct prediction models including: Logistic Regression (LR) model, Random Forest (RF) model, Support Vector Machine (SVM) model, Gaussian Naive Bayes (GNB) model, KNeighbors Classifier (KNN) model, Gradient Boosting Classifier (GBM) model, AdaBoost Classifier (AdaBoost) model, and Extreme Gradient Boosting (XGBoost) model were developed to achieve the idea predictive performance, which was further assessed by comparing the area under the receiver operating characteristic curve (AUC), accuracy, F1 score, and Brier score. Clinical utility metrics were evaluated using a decision curve analysis (DCA). After determining the best-performing model, the significant variables were visualized using xgb. plot and further interpretation of the XGBoost model using R Studio. Using the established XGBoost model, we calculated the area under the curve, accuracy, sensitivity, and specificity for predicting the occurrence of DKD in the external validation set. Lastly, the online XGBoost model via the Shiny package hosted on shinyapps.io, acting as a web-based predictor, was found to significantly drive the outcome, which conveniently and accurately estimates the risk of DKD in patients with T2DM. Statistical significance was set at p < 0.05. Analyses were performed using R version 4.4.2 and Python 3.13.2.
Results
Patient characteristics
In total, 10,057 T2DM patients were enrolled in the present study based on the inclusion and exclusion criteria (Figure 1). Table 1 shows patient characteristics according to the DKD complication accompanied by some significant differences in age, hs-CRP, IBIL, and history of cerebral infarction (all P < 0.05) observed between the training and validation sets.
Identification of feature variables
Through the variable assignment details shown in Supplementary Table 1, we applied LASSO regression using non-zero coefficients to further identify some strong variables to optimize the predictive model. With a 10-fold cross-validation for the optimal lambda value (lambda.1se=0.01397873), we ultimately selected 15 features relative to DKD, which included sex, age, SBP, BUN, Cr, BUN/Cr, UA, HbA1c, microalbuminuria, presence of DR, hypertension, CHD, history of cerebral infarction, family history of diabetes, and family history of CHD (Figures 2A, B).

Figure 2. Identification of variables by LASSO regression. (A) Coefficient curves for the 47clinical features, (B) Selection of optimal variables through 10-fold cross-validation.
Comparison of predictive models
We separately integrated the above 15 key variables into each of the eight machine learning models to compare the predictive ability of developing DKD risk in patients with T2DM. As shown in Figure 3, in the training set, using 10-fold cross-validation for discrimination, the mean AUC for the XGBoost model was the highest (0.932 95%CI (0.926-0.938), as well as; accuracy 0.845, sensitivity 0.834, specificity 0.857, and F1 score, 0.847 (Figure 3A and Table 2). Consistently, comparison among these models in the validation set showed that the XGBoost model also presented the best performance (AUC = 0.930, 95%CI (0.920-0.939), an accuracy of 0.844, a sensitivity of 0.850, a specificity of 0.837, and an F1 score of 0.848 (Figure 3B and Table 3). The calibration plots of the eight models show that XGBoost achieved better Brier scores (0.167 in the training set and 0.166 in the validation set) than the other models (Figure 4). This suggests that the XGBoost model is optimal for predicting the DKD risk in T2DM patients.

Figure 3. Receiver-operating characteristic curves for eight machine learning models. (A) Comparison of AUCs among the eight machine learning models in the training set, (B) Comparison of AUCs among the eight machine learning models in the validation set.

Figure 4. Calibration plots of the eight models. (A) Comparison of calibration plots among eight machine learning models in the training set and (B) comparison of calibration plots among eight machine learning models in the validation set.
Furthermore, after selecting the XGBoost model, the SHAP package was used to analyze the XGBoost model, which reflects the influence of each feature in the sample and shows the positive and negative influences (Figure 5). For the external validation dataset, data of 1,508 patients were collected to validate the performance of the established XGBoost model (AUC = 0.878, 95% CI (0.920-0.939), accuracy = 0.788, sensitivity = 0.783, specificity = 0.793, F1 score = 0.791) (Figure 6).

Figure 5. SHAP analysis of XGBoost model. A visual representation of each feature in the XGBoost model shows the relationship between the importance of each feature. The color represents the value of the variable, with red representing a larger value and blue representing a smaller value.
Decision curve analysis
To further investigate the clinical application of the XGBoost model, a comparison of the DCA among the eight machine-learning models was conducted. The results still show a larger net benefit across a range of threshold probabilities in the XGBoost model (Figure 7). For application of the XGBoost model, the best cut-off for the prediction probability of the proposed model was 50.7%. If the model predicted a probability > 50.7%, the risk of developing DKD in patients with T2DM was higher (Table 2).

Figure 7. Decision curve analysis of the eight models predicting the incidence of DKD. (A) Comparison of DCA among the eight machine learning models in the training set, (B) Comparison of DCA among the eight machine learning models in the validation set.
Application of the model
Last, based on a cut-off value of 50.7% in this model, we constructed an online prediction calculator for DKD risk (https://liting3659078.shinyapps.io/myrapp/, Figure 8), by which a practice of two representative patients exhibited a good predictive effectiveness (Supplementary Figure 1). The indicators related to these two patients are shown in Supplementary Table 2.

Figure 8. Establish a website predictor for the risk of developing DKD based on the XGBoost model. The URL provided is: https://liting3659078.shinyapps.io/myrapp/.
Discussion
In China, the management of DKD in patients with T2DM faces challenges characterized by low screening rates, low awareness among patients, low treatment rates, unattainable therapeutic goals, and insufficient community-based preventive capacities. Chen et al. (15) conducted a 7-year follow-up study on 907 diabetic patients from the Taopu Community Health Service Center in Putuo district of Shanghai, revealing that by 2015, the screening rate of DKD was merely 55.1%, which is notably lower than that of diabetic neuropathy and retinopathy (77.6%). Hence, developing strategies to efficiently increase the screening rate among high-risk populations and implementing clinical prediction tools could be a solution.
The present study was the first to ensure the 15 predictive variables affecting the occurrence of DKD in patients with T2DM as follows: gender, age, SBP, BUN, Cr, BUN/Cr, UA, HbA1c, microalbuminuria, presence of DR, hypertension, CHD, history of cerebral infarction, family history of diabetes, and family history of CHD following LASSO regression analysis, which can balance optimal fitting error and adjust the quantity and magnitude of model parameters, thereby identifying those features with enhanced predictive power over the outcome variable. This process reduces the model complexity, mitigates multicollinearity, prevents overfitting, and ultimately enhances the generalizability of the model. We constructed and compared the predictive efficacy of eight machine learning models for forecasting the DKD aspect, and the XGBoost model exhibited superior predictive capabilities in both the training and validation sets, with AUC values of 0.932 and 0.930, and F1 scores of 0.847 and 0.848, respectively. Moreover, this optimal model had a larger net benefit and threshold probability, demonstrating the clinical significance of DKD management.
The 15 predictive variables related to the occurrence of DKD in patients with T2DM were ranked as follows: microalbuminuria, presence of DR, hypertension, Cr, UA, BUN/Cr, age, BUN, family history of diabetes, HbA1c, SBP, family history of CHD, sex, history of cerebral infarction, and presence of CHD. Microalbuminuria was found to have the most significant effect on the occurrence of DKD. This is likely because microalbuminuria is a crucial biomarker in the early stages of DKD. When the kidneys of diabetic patients begin to sustain damage, microalbumin begins to appear in the urine, acting as an early indicator of renal impairment. A systematic review has indicated that DR is closely associated with nephropathy. The presence of DR increases the risk of nephropathy and serves as a predictive indicator of microalbuminuria progression (16). Hypertension is a major risk factor for the progression of DKD and the occurrence of cardiovascular diseases and death, and persistent hypertension exacerbates the burden on the kidneys (17–19). UA, Cr, BUN, and microalbumin are common indicators of renal function, with Cr, BUN, and UA playing essential roles in early DKD screening (20). The results of our study were similar to the results of Li et al. (21) by multifactorial logistic regression analysis, and the prevalence of DKD was significantly higher in patients with T2DM aged ≥50 years [OR = 4.011, 95%CI (3.152-5.104)], which is consistent with the results of our study. As we known that, HbA1c serves as a pivotal index for evaluating long-term glycemic control in diabetic patients, and Ali et al. (22) showed that HbA1c plays a significant role in the development of DKD, with an association between HbA1c and microalbuminuria. Microalbuminuria is a crucial early marker of diabetic nephropathy, and when renal damage begins in diabetic patients, microalbuminuria appears in the urine. Elevated HbA1c levels often correlate with increased microalbuminuria. In our study, HbA1c emerged as the most influential risk factor for DKD occurrence, likely because all participants were patients with type 2 diabetes and HbA1c was a key indicator selected by LASSO regression. In this study, sex influenced the occurrence of DKD, with males at a higher risk. Research shows that sex differences play a key role in the progression of DKD in T2DM patients, as the DKD incidence rate in males (23.2%) is higher than that in females (19.8%) (8). Logistic regression analysis revealed that a family history of diabetes was significantly associated with the development of DKD (P < 0.05) (23).
Using the XGBoost model established based on the above characteristic variables, we conducted an external validation on a dataset that was not used for training and testing. The results showed that relatively excellent AUC, F1 score, and so on were obtained. Thus, with the advent of the artificial intelligence era, a growing body of research has shown that many models have been developed to predict the occurrence and prognosis of diseases, even the early identification of high-risk populations for DKD. However, a comprehensive comparison of multi-predictive models on performance and clinical value as well as online application remains unknown. Additionally, previous studies required manual calculations with model inputs, which significantly limited their practicality. To enhance the usability of the constructed models, we designed and deployed an online prediction calculator hosted to facilitate its availability to clinicians and patients and explored one example confirming its practical application efficiency.
This study has several limitations attention as follows: 1) The information on patients’ medication use wasn’t included in this study, preventing the identification of specific drugs and their combinations’ impact on the development of DKD. 2) The data were from hospital settings excluding community-dwelling T2DM populations, which account for a large number of high-risk DKD patients.
Overall, our study provides an optimal predictive model (XGBoost model) integrated with 15 featured indicators on a dedicated website for DKD occurrence in T2DM patients. This tool can effectively support clinical decision making and patient guidance.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by The Ethics Committee of the Second Hospital of Shijiazhuang City. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
TL: Data curation, Formal Analysis, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. JC: Resources, Software, Visualization, Writing – review & editing. XZ: Resources, Software, Writing – review & editing. KW: Data curation, Formal Analysis, Writing – review & editing. XSZ: Software, Visualization, Writing – review & editing. YC: Data curation, Software, Writing – review & editing. ZX: Software, Visualization, Writing – review & editing. SW: Data curation, Writing – review & editing. PS: Data curation, Formal Analysis, Writing – review & editing. XH: Resources, Visualization, Writing – review & editing. YY: Resources, Software, Writing – review & editing. XC: Resources, Writing – review & editing. DM: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Writing – review & editing. XL: Conceptualization, Project administration, Resources, Supervision, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research and/or publication of this article. This work was funded by grants from the National Natural Science Foundation of China (82270508), Hebei Provincial Natural Science Foundation Joint Fund for Precision Medicine (H2025206777), Youth Fund for Director of Key Laboratory of Neuro and Vascular Biology, Ministry of Education (NV20210006), Scientific Research Program of the Department of Education of Hebei Province (QN2022164), and Shijiazhuang Science and Technology Research and Development Program (191460933).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1587932/full#supplementary-material
Supplementary Table 1 | Variable assignment.
Supplementary Table 2 | Indicators related to DKD and non-DKD patients.
Supplementary Figure 1 | DKD online predictor for running results of two patients. (a) The predicted risk of developing DKD in Case 1 was 43.08% (< 50.7%), and (b) the predicted risk of developing DKD in Case 2 was 51.90%(> 50.7%).
References
1. International Diabetes Federation. Diabetes facets and figures(2024). Available online at: https://idf.org/about-diabetes/diabetes-facts-figures/. (Accessed October 30, 2025).
2. Aldemir O, Turgut F, and Gokce C. The association between methylation levels of targeted genes and albuminuria in patients with early diabetic kidney disease. Ren Fail. (2017) 39:597–601. doi: 10.1080/0886022X.2017.1358180
3. Chen TK, Knicely DH, and Grams ME. Chronic kidney disease diagnosis and management: a review. JAMA. (2019) 322:1294–304. doi: 10.1001/jama.2019.14745
4. Afkarian M, Sachs MC, Kestenbaum B, Hirsch IB, Tuttle KR, Himmelfarb J, et al. Kidney disease and increased mortality risk in type 2 diabetes. J Am Soc Nephrol. (2013) 24:302–8. doi: 10.1681/ASN.2012070718
5. Jiao F, Wong C, Tang S, Fung C, Tan K, McGhee S, et al. Annual direct medical costs associated with diabetes-related complications in the event year and in subsequent years in Hong Kong. Diabetes Med. (2017) 34:1276–83. doi: 10.1111/dme.13416
6. Major RW, Cheng MRI, Grant RA, Shantikumar S, Xu G, Oozeerally I, et al. Cardiovascular disease risk factors in chronic kidney disease:a systematic review and meta-analysis. PloS One. (2018) 13:e0192895. doi: 10.1371/journal.pone.0192895
7. Gheith O, Farouk N, Nampoory N, Halim MA, and Al-Otaibi T. Diabetic kidney disease:world wide difference of prevalence and risk factors. J Nephropharmacol. (2016) 5:49–56.
8. Pan W, Wang ML, Xu Y, Zhang JS, Zhao MM, Wan J, et al. Analysis of disease burden and risk factors of diabetic kidney disease in China from 1990 to 2019. Chin J Nephrol. (2023) 39:576–86. doi: 10.3760/cma.j.cn441217-20221115-01129
9. Coca SG, Nadkarni GN, Huang Y, Moledina DG, Rao V, Zhang J, et al. Plasma biomarkers and kidney function decline in early and established diabetic kidney disease. J Am Soc Nephrol. (2017) 28:2786–93. doi: 10.1681/ASN.2016101101
10. Niewczas MA, Gohda T, Skupien J, Smiles AM, Walker WH, Rosetti F, et al. Circulating TNF receptors 1 and 2 predict ESRD in type 2 diabetes. J Am Soc Nephrol. (2012) 23:507–15. doi: 10.1681/ASN.2011060627
11. Xi CF, Wang CM, Rong GH, and Deng JH. A nomogram model that predicts the risk of diabetic nephropathy in type 2 diabetes mellitus patients: a retrospective study. Int J Endocrinol. (2021) 8:6672444. doi: 10.1155/2021/6672444
12. Shi R, Niu ZY, Wu B, Zhang TT, Cai DJ, Sun H, et al. Nomogram for the risk of diabetic nephropathy or diabetic retinopathy among patients with type 2 diabetes mellitus based on questionnaire and biochemical indicators:a cross-sectional study. Diabetes Metab Syndr Obes. (2020) 13:1215–29. doi: 10.2147/DMSO.S244061
13. Chinese Diabetes Society. Guideline for the prevention and treatment of type 2 diabetes mellitus in China (2020 edition). Chin J Diabetes Mellitus. (2021) 13:315–409. doi: 10.2147/DMSO.S244061
14. The Microvascular Complications Study Group of the Chinese Diabetes Society (CDS). Clinical guideline for the prevention and treatment of diabetic kidney disease in China (2021 edition). Chin J Diabetes Mellitus. (2021) 13:762–84. doi: 10.3760/cma.j.cn121383-20210825-08064
15. Chen SY, Hou XH, Sun Y, Hu G, Zhou XY, Xue HJ, et al. A seven-year study on an integrated hospital-community diabetes management program in Chinese patients with diabetes. Prim Care Diabetes. (2018) 12:231–7. doi: 10.1016/j.pcd.2017.12.005
16. Pearce I, Simó R, Lövestam-Adrian M, Wong DT, and Evans M. Association between diabetic eye disease and other complications of diabetes: implications for care.A systematic review. Nutrients. (2019) 11:467–78. doi: 10.1111/dom.13550
17. Morton JI, Lazzarini PA, Polkinghorne KR, Carstensen B, Magliano DJ, Shaw JE, et al. The association of attained age, age at diagnosis, and duration of type 2 diabetes with the long-term risk for major diabetes-related complications. Diabetes Res Clin Pract. (2022) 190:110022. doi: 10.1016/j.diabres.2022.110022
18. Emdin CA, Rahimi K, Neal B, Callender T, Perkovic V, Patel A, et al. Blood pressure lowering in type 2 diabetes: a systematic review and meta-analysis. JAMA. (2015) 313:603–15. doi: 10.1001/jama.2014.18574
19. Bakris GL, Agarwal R, Chan JC, Cooper ME, Gansevoort RT, Haller H, et al. Effect of finerenone on albuminuria in patients with diabetic nephropathy: a randomized clinical trial. Am J Kidney Dis. (2015) 2015:31484–94. doi: 10.1001/jama.2015.10081
20. Wu L, Chang DY, and Chen H. Early screening and evaluation of diabetic kidney disease. Chin J Gen Pract. (2022) 21:814–6. doi: 10.3760/cma.j.cn114798-20220429-00356
21. Li YL, Liao YG, Li XW, Zheng HY, Huang MW, Chen SS, et al. Risk factors of diabetic nephropathy. Prev Med. (2017) 24:133G6. doi: 10.3969/j.issn.1006-3110.2017.02.002
22. Ali F, Alsayegh F, Sharma P, Waheedi M, Bayoud T, Alrefai F, et al. White blood cell subpopulation changes and prevalence of neutropenia among Arab diabetic patients attending Dasman Diabetes Institute in Kuwait. PloS One. (2018) 13:e0193920. doi: 10.1371/journal.pone.0193920
Keywords: type 2 diabetes mellitus, diabetic kidney disease, machine learning, prediction model, predictive value
Citation: Li T, Chen J, Zhang X, Wang K, Zhao X, Cao Y, Xu Z, Wang S, Su P, He X, Yang Y, Cao X, Liang X and Ma D (2025) A machine learning model for predicting the risk of diabetic nephropathy in individuals with type 2 diabetes mellitus. Front. Endocrinol. 16:1587932. doi: 10.3389/fendo.2025.1587932
Received: 05 March 2025; Accepted: 19 September 2025;
Published: 15 October 2025.
Edited by:
Åke Sjöholm, Gävle Hospital, SwedenReviewed by:
Chris Robert Neal, University of Bristol, United KingdomRoshan Kumar Mahat, Dharanidhar Medical College and Hospital, India
Copyright © 2025 Li, Chen, Zhang, Wang, Zhao, Cao, Xu, Wang, Su, He, Yang, Cao, Liang and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xiaohua Liang, MTMzNjM4Njc2NjlAMTYzLmNvbQ==; Dong Ma, bWFkb25nMTE5QGhlYm11LmVkdS5jbg==