AUTHOR=Li Tingting , Chen Jinbo , Zhang Xin , Wang Kaiwen , Zhao Xuesen , Cao Yi , Xu Zhen , Wang Shiyue , Su Peng , He Xiaoyan , Yang Yang , Cao Xiaolu , Liang Xiaohua , Ma Dong TITLE=A machine learning model for predicting the risk of diabetic nephropathy in individuals with type 2 diabetes mellitus JOURNAL=Frontiers in Endocrinology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2025.1587932 DOI=10.3389/fendo.2025.1587932 ISSN=1664-2392 ABSTRACT=IntroductionDiabetic kidney disease (DKD) represents the predominant form of chronic kidney disease (CKD) linked with diabetes mellitus. The application of artificial intelligence holds promise for delaying renal deterioration and decreasing treatment expenses by facilitating early detection and intervention. This is contingent upon the development of an efficient and user-friendly model for predicting DKD risk in diabetic individuals. In this study, leveraging extensive clinical datasets, we sought to develop and validate a predictive model employing machine learning techniques to assess the risk of DKD in patients with type 2 diabetes mellitus (T2DM).Research design and methodsWe conducted a retrospective collection of clinical data from 10,057 patients diagnosed with T2DM at Shijiazhuang Second Hospital. A random selection of 15% of these patients (n=1,508) was utilized for external validation. The remaining 8,549 patients were divided into a training set (n = 5,985) and a validation set (n = 2,564) using a simple random sampling method in a 7:3 ratio. Subsequently, we employed LASSO regression to identify variables significantly associated with DKD in T2DM patients. These variables were incorporated into eight distinct predictive models: Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), KNeighbors Classifier (KNN), Gradient Boosting Classifier (GBM), AdaBoost Classifier (AdaBoost), and Extreme Gradient Boosting (XGBoost). The models’ predictive performance was assessed using metrics such as the area under the curve (AUC), accuracy, F1 score, and Brier score. Finally, we developed an online calculator to estimate DKD risk in T2DM patients.ResultsFifteen features—namely gender, age, systolic blood pressure (SBP), blood urea nitrogen (BUN), creatinine (Cr), BUN/Cr ratio, uric acid (UA), hemoglobin A1c (HbA1c), microalbuminuria, presence of diabetic retinopathy (DR), hypertension, coronary heart disease (CHD), history of cerebral infarction, family history of diabetes, and family history of CHD-associated with DKD were selected using LASSO regression. Among eight evaluated models, the XGBoost algorithm demonstrated superior performance on both training and validation datasets, with an AUCof 0.932 (95%CI: 0.926-0.938) and 0.930, (95%CI: 0.920-0.939), respectively. The model achieved an accuracy of 0.845 and 0.844, sensitivity of 0.834 and 0.850, specificity of 0.857 and 0.837, F1 score of 0.847 and 0.848, and a Brier score of 0.167 and 0.166, respectively. Decision curve analysis (DCA) further validated the superiority of the XGBoost model over other models across a range of clinically relevant risk thresholds, yielding the highest net benefits. Finally, an online predictive calculator for the occurrence of DKD was developed based on the XGBoost model, utilizing a cut-off value of 50.7%.ConclusionsThe developed XGBoost model demonstrated optimal predictive accuracy for the occurrence of DKD in patients with T2DM. This model facilitated the construction of an online prediction calculator, offering an accessible and practical tool for both patients and clinicians.