Machine learning-based prediction of diabetic peripheral neuropathy: model development and clinical validation

Sun, Meng; Sun, Xingling; Wang, Fei; Liu, Li

doi:10.3389/fendo.2025.1614657

ORIGINAL RESEARCH article

Front. Endocrinol., 05 June 2025

Sec. Clinical Diabetes

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1614657

This article is part of the Research TopicDiabetes Complications: Navigating Challenges and Unveiling New SolutionsView all 4 articles

Machine learning-based prediction of diabetic peripheral neuropathy: model development and clinical validation

Meng Sun^1†

Xingling Sun^2†

Fei Wang^2*

Li Liu^1*

¹Department of Neurosurgery, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, Shandong, China
²Department of Nursing, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, Shandong, China

Background: Diabetic peripheral neuropathy (DPN) is a common and debilitating complication of type 2 diabetes mellitus (T2DM), significantly impacting patients’ quality of life and increasing healthcare burdens. Early prediction and intervention are critical to mitigating its impact.

Methods: This study analyzed 1,544 diabetic patients from the First Affiliated Hospital of Shandong First Medical University, who were randomly divided into a training cohort (n = 1,082) and a testing cohort (n = 462) using a 7:3 split ratio. Feature selection was performed using both Boruta and LASSO algorithms, and the intersection of the selected variables was used as the final predictor set. Eight key predictors were identified from 23 variables, including diabetes duration, uric acid, HbA1c, NLR, smoking status, SCR, LDH, and hypertension. Nine machine learning models were developed and compared for DPN risk prediction.

Results: Stochastic Gradient Boosting (SGBT) demonstrated the best performance (training AUC: 0.933, 95% CI: 0.921–0.946; testing AUC: 0.811, 95% CI: 0.776–0.843). Shapley Additive Explanations (SHAP) analysis provided interpretability, highlighting the clinical importance of diabetes duration and HbA1c among other predictors.

Conclusion: This study establishes a robust predictive tool for early DPN detection, laying the foundation for improved prevention and management strategies.

1 Introduction

The global prevalence of diabetes is rising at an unprecedented rate. In 2021, approximately 537 million adults (20–79 years) worldwide were living with diabetes, and this number is expected to increase to 783 million by 2045 (1). Diabetic peripheral neuropathy (DPN) is a common microvascular complication of type 2 diabetes (2–4), which significantly increases the risk of diabetic foot ulcers, non-traumatic lower limb amputations, and other related complications such as falls (5, 6). Furthermore, diabetic patients with peripheral neuropathy (PN) and foot ulcers have a more than two-fold increased relative risk of mortality compared to those without these two conditions (7). PN has been recognized as an independent risk factor for mortality in adults with diabetes (8). Therefore, there is an urgent need to develop novel strategies for the prevention and early diagnosis of DPN, which could help reduce associated disability and mortality rates, ultimately improving patient quality of life.

Machine learning (ML) technology has gained significant traction in the healthcare sector, demonstrating its potential in addressing complex medical challenges, including disease prediction and management (9–11). By leveraging large datasets, ML algorithms can uncover subtle patterns and relationships that are difficult to discern through traditional statistical methods.

In recent years, several studies have developed ML-based models to predict DPN, providing valuable insights into the feasibility of applying ML to this condition (12, 13). However, these studies often have limited clinical utility, as they typically rely on a single ML algorithm, lack model interpretability tools such as SHAP to explain feature contributions, and are rarely deployed in user-friendly formats like Shiny applications for real-world use. To overcome these limitations, we propose a comprehensive modeling strategy that incorporates robust feature selection (using Boruta and LASSO), systematic comparison of multiple machine learning algorithms to identify the best performer, and model interpretation through SHAP (Shapley Additive Explanations) to enhance transparency and clinical interpretability. We further proposed the development of a web-based risk calculator to enhance clinical implementation and support decision-making at the point of care. Given the substantial burden of DPN on affected individuals, the development of interpretable and accessible prediction tools has the potential to enable earlier diagnosis, guide preventive strategies, and improve patient outcomes.

This study aims to develop and validate an ML-based risk prediction model for DPN, using a combination of demographic, clinical, and biochemical parameters. By integrating robust ML algorithms and real-world clinical data, this research seeks to provide a valuable tool for clinicians to identify high-risk patients, enabling timely interventions and personalized treatment strategies to mitigate the impact of DPN.

2 Material and methods

2.1 Study population

We collected data from 1,544 patients with type 2 diabetes mellitus (T2DM) who received treatment at the First Affiliated Hospital of Shandong First Medical University between January 2023 and December 2024. The inclusion criteria were: (1) age ≥18 years and (2) a diagnosis of T2DM. Exclusion criteria included other causes of peripheral neuropathy, malignant tumors, acute infectious diseases, severe hepatic or renal dysfunction, cardiac failure, metabolic disorders (such as thyroid disorders or vitamin B12 deficiency), and other severe life-threatening conditions. All patients underwent neurological assessments and nerve conduction studies (NCSs). DPN was diagnosed based on the presence of neuropathy-related clinical signs or symptoms and abnormal electromyography results, following the Toronto Expert Consensus (14). These participants were then randomly divided into training and testing groups in a 7:3 ratio. For participants with missing data, multiple imputations were performed using the “mice” package (n = 5). All variables had less than 30% missingness. Predictive mean matching was used for continuous variables, and logistic regression was applied for binary variables. Further details are provided in Figure 1. The study followed the principles of the Declaration of Helsinki and was approved by the Ethics Committee of the First Affiliated Hospital of Shandong First Medical University (Hospital Ethics Review No. S654). All the above data have been ethically reviewed.

Figure 1

Figure 1. Overview of data processing and machine learning workflow. (A) Database: Participants were selected from The First Affiliated Hospital of Shandong First Medical University (2023–2024). Inclusion criteria were age ≥18 years and T2DM diagnosis. Exclusion criteria included other neuropathies, malignancies, severe infections, organ dysfunction, and metabolic disorders. The final study cohort comprised 1,544 individuals. (B) Feature selection: From an initial set of 23 variables, key variables were identified using the Boruta and LASSO methods. Selected features included diabetes duration, serum creatinine (SCR), hypertension, neutrophil-to-lymphocyte ratio (NLR), smoking status, uric acid, lactate dehydrogenase (LDH), and HbA1c. (C) Model training and testing: Nine machine learning algorithms, including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), XGBoost (XGB), Stochastic Gradient Boosting Trees (SGBT), and Neural Network (NNET), were applied, with hyperparameter optimization performed using 10×10-fold cross-validation. Model evaluation metrics included Receiver Operating Characteristic (ROC), Area Under the Curve (AUC), F1-score, calibration curves, and decision curves. Model interpretation was conducted using SHAP analysis for feature importance.

2.2 Research variables

Based on clinical expertise and previous research evidence, including demographic characteristics (age, sex, smoking, and drinking status), physical measurements (BMI), medical history (hypertension and diabetes mellitus), and laboratory test results (white blood cell count, neutrophils, lymphocytes, and other relevant biomarkers). The neutrophil-to-lymphocyte ratio (NLR) was calculated by dividing the neutrophil count by the lymphocyte count (15).

2.3 Feature screening

Boruta is a random forest-based feature selection algorithm used to evaluate the importance of variables and identify features significantly associated with the target variable (16). We implemented this algorithm in R using the “Boruta” package, with parameters set to “pValue = 0.01” and “maxRuns = 500”. The algorithm generates random shadow variables and compares their importance distribution with that of the actual variables, iteratively assessing the significance of each variable. After a maximum of 500 iterations or when the variable importance stabilizes, the algorithm finalizes the selection results and identifies significant features.

LASSO regression applies a penalty function to shrink certain regression coefficients, imposing a constraint on the sum of their absolute values to keep it below a predetermined threshold, thereby enhancing the model’s robustness (17). We performed LASSO regression using the glmnet package in R, setting the family parameter to “binomial” to suit our binary outcome data. The key parameter, alpha, was set to 1, fully utilizing the LASSO method. Through cross-validation using the cv.glmnet function, we selected two lambda values: lambda.min, which minimizes the cross-validation error, and lambda.1se, which offers a more parsimonious model. These two values help strike a balance between model complexity and predictive accuracy. Finally, we filtered out the variables that contributed meaningfully to prediction based on non-zero coefficients, thereby simplifying the model and enhancing its interpretability.

As previously reported in the literature (16), we adopted the intersection of features identified by both Boruta and LASSO as the final set of predictors. This approach balances model complexity and interpretability while ensuring robust feature selection.

2.4 Algorithm development and validation

The predictive model was generated by partitioning the complete dataset into two mutually exclusive subsets. Seventy percent of the data was used for the training set, while the remaining 30% was allocated to the test set. The test set was reserved exclusively for final model evaluation and was not involved in any model training, feature selection, hyperparameter tuning, or validation procedures. All model development steps, including cross-validation and hyperparameter optimization, were conducted solely within the training set to prevent data leakage and ensure robust evaluation. Various machine learning algorithms were employed to develop the models, including logistic regression (LR), random forest (RF), support vector machine (SVM), decision tree (DT), k-nearest neighbors (KNN), Naive Bayes (NB), stochastic gradient boosting (SGBT), and neural network (NNET), as well as extreme gradient boosting (XGB). These machine learning algorithms were implemented using the Python package “Scikit-learn (version 0.24.1)”. These algorithms were selected for their ability to model complex relationships between variables and their robustness in handling both linear and nonlinear data structures. The training process employed 10-fold cross-validation, ensuring that the model was trained on different subsets of the data to improve its generalization capability and prevent overfitting. For each model, hyperparameter tuning was performed using grid search, evaluating a range of possible hyperparameters to identify the optimal value for each algorithm. To enhance the performance of the predictive models, the optimal hyperparameters for each model were identified through a combination of 10 rounds of 10-fold cross-validation and hyperparameter grid search, utilizing the best feature subset (Supplementary Table S1).

The performance of the machine learning models was evaluated using the testing set, which had not been involved in the training process. Key evaluation metrics, including accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-score, kappa score, and area under the receiver operating characteristic curve (AUC), were calculated to assess the model’s ability to accurately classify individuals at risk for DPN. Additionally, decision curve analysis (DCA) was performed to evaluate the clinical utility of the models.

2.5 Model explanation

Explaining ML models can be challenging, especially with complex models often referred to as “black-box” models. The Shapley Additive Explanations (SHAP) method, based on game theory, provides a solution to this issue by ranking the importance of input features and explaining the results of predictive models. SHAP calculates the contribution of each feature to the prediction, offering both local and global explanations, thereby enhancing the transparency and interpretability of the model (9, 18). Interpretability analysis was performed using the SHAP Python library (version 0.43.0).

2.6 Network calculator

To support clinical implementation, the final prediction model was deployed via a Shiny-based web platform. By inputting the relevant clinical variables, the application generates an individualized probability of DPN in diabetic patients.

2.7 Statistical analysis

The preliminary analysis of the dataset involved the application of descriptive statistics. In clinical data, continuous variables were expressed as mean ± standard deviation (SD), while categorical variables were described using frequencies and percentages. Statistical tests, such as the chi-square test and unpaired t-test, were used to compare variables between groups. All statistical analyses were performed using the R version 4.4.2 software package. A two-sided P value of < 0.05 was considered statistically significant.

3 Result

3.1 Patient characteristics

Table 1 presents the characteristics of the cohort. This study included 1,544 diabetic patients (mean age: 64.39 years; 53% male). Baseline characteristics were compared between patients with DPN and patients without DPN, significant statistical differences were found between the groups in terms of age, gender, smoking status, hypertension, NE, LYM, NLR, Hb, uric acid, HbA1c, TC, SCR, Albumin, LDH, and diabetes duration (P < 0.05). The type 2 diabetes patients were randomly divided into the training group (n = 1,082) and the validation group (n = 462). In both groups, approximately 34.3% and 34.2% of the patients were diagnosed with DPN, respectively. The baseline characteristics of the two groups were similar (Table 1).

Table 1

Table 1. Comparison of demographic characteristics and clinical characteristics between diabetic peripheral neuropathy (DPN) and non-DPN patients, and between training and test sets.

3.2 Predictor screening

The Boruta algorithm is an extension of the random forest method that accurately estimates the importance of each feature to identify the actual feature set. The Boruta algorithm identified 9 key factors, including diabetes duration, uric acid, HbA1c, NLR, smoking status, SCR, LDH, Albumin, and hypertension (Figure 2A). In contrast, LASSO regression is a shrinkage estimation method that performs variable selection and complexity adjustment by formulating an optimization objective function that includes a penalty term. In this study, LASSO regression was used to identify features such as diabetes duration, uric acid, HbA1c, NLR, smoking status, SCR, LDH, and hypertension (Figures 2B, C). By comparing the results obtained from the Boruta algorithm and LASSO regression, we identified the common subset of features selected by both methods. These selected features were ultimately used to construct the model, including diabetes duration, uric acid, HbA1c, NLR, smoking status, SCR, LDH, and hypertension (Figure 2D).

Figure 2

Figure 2. Predictor screening results. (A) Boruta feature selection. (B) LASSO screening with lambda values indicated by dashed lines. (C) Variable trajectories in the LASSO model. (D) Common predictors identified by Boruta and LASSO, including diabetes duration, HbA1c, NLR, SCR, hypertension, smoking status, LDH, and uric acid.

3.3 Model performance

We performed ten rounds of 10-fold internal cross-validation and developed nine machine learning models. In the training dataset, the AUC values for the models were as follows: LR 0.722 (95% CI: 0.694-0.749), DT 0.704 (95% CI: 0.677-0.730), RF 0.796 (95% CI: 0.774-0.818), KNN 0.883 (95% CI: 0.866-0.899), SVM 0.678 (95% CI: 0.649-0.705), NB 0.705 (95% CI: 0.677-0.732), XGB 0.939 (95% CI: 0.928-0.950), SGBT 0.933 (95% CI: 0.921-0.946), and NNET 0.784 (95% CI: 0.760-0.808) (Figure 3A). In the test set, the SGBT model demonstrated superior predictive performance with an AUC of 0.811 (95% CI: 0.776-0.843). In comparison, the AUC values for the remaining models in the test set were as follows: LR 0.757 (95% CI: 0.719-0.749), DT 0.657 (95% CI: 0.613-0.699), RF 0.757 (95% CI: 0.718-0.794), KNN 0.749 (95% CI: 0.712-0.787), SVM 0.692 (95% CI: 0.649-0.734), NB 0.704 (95% CI: 0.668-0.747), XGB 0.810 (95% CI: 0.777-0.843), and NNET 0.740 (95% CI: 0.701-0.779) (Figure 3B). The accuracy, sensitivity, specificity, PPV, NPV, F1 score, and kappa values were calculated and compared for models within the training set (Figure 3C) and the test set (Figure 3D). The DCA demonstrated that, in the training set, the SGBT model outperformed all other models across the entire threshold range (0–0.8), followed by the XGB model (Figure 3E). In the test set, the RF model provided the best performance across the full threshold range (0–1.0) (Figure 3F).

Figure 3

Figure 3. Performance and comparison of nine predictive models. ROC curves for the training set (A) and the test set (B). Evaluation metrics for the training set (C) and the test set (D), including accuracy, sensitivity, specificity, PPV, NPV, F1 score, and kappa value. Decision Curve Analysis (DCA) for the training set (E) and the test set (F).

3.4 Model explanation

We employed the SHAP method to interpret the final model’s output by calculating the contribution of each variable to the prediction. In Figure 4A, feature importance is visualized, where each point corresponds to a sample, and a color gradient from blue (low values) to red (high values) reflects the magnitude of the feature value. The vertical axis displays the ranked features, showing the correlation and distribution of feature values with their corresponding SHAP values. Figure 4B presents the average SHAP values for each feature, ordered by importance in descending order on the vertical axis. The analysis reveals that diabetes duration, HbA1c, SCR, hypertension, and uric acid are the top five most influential features, indicating their critical role in predicting DPN. Figure 4C provides a decision plot, which illustrates the contribution of each feature to the final prediction. This plot also tracks the changes in SHAP values for individual samples, offering insights into feature interactions and the model’s decision-making process.

Figure 4

Figure 4. SHAP analysis for feature interpretability. (A) SHAP dendrogram of features for the SGBT model. (B) Feature importance ranking for the logistic regression model. (C) Decision plot of feature contributions to the model outputs.

3.5 Implementation of the web calculator

As illustrated in Figure 5, the final SGBT model was deployed as an interactive web application to facilitate clinical use. By entering the values of the eight selected features, clinicians can obtain an individualized risk estimate for DPN. The tool is available online at: https://dpn-prediction.shinyapps.io/shiny-sgbt/.

Figure 5

Figure 5. Web-based calculator for predicting the risk of diabetic peripheral neuropathy (DPN) in patients with diabetes using the developed model. By entering values for diabetes duration, uric acid, HbA1c, NLR, smoking status, serum creatinine (SCR), lactate dehydrogenase (LDH), and hypertension, an individualized DPN risk prediction can be obtained.

4 Discussion

DPN is one of the most common complications in patients with T2DM, characterized by its progressive, irreversible, and debilitating nature (19, 20). With the increasing prevalence of diabetes, DPN has become a significant global public health issue. DPN not only severely impacts the quality of life of patients but also leads to long-term medical costs and substantial economic burden. Without timely diagnosis and intervention, DPN can result in severe complications such as lower limb ulcers, infections, and even amputations, all of which significantly increase mortality and disability rates. Therefore, developing effective early screening and risk assessment tools is of crucial importance for the early diagnosis, intervention, and prevention of DPN. This study aims to assess and predict the risk of DPN using ML techniques. We compared nine different machine learning models for analyzing and predicting the risk of DPN, with the goal of developing a predictive tool that can effectively identify the risk of DPN.

This study employed a dual approach combining the Boruta algorithm and LASSO regression to ensure accurate feature selection and model stability. The final predictive factors identified included diabetes duration, uric acid, HbA1c, NLR, smoking status, SCR, LDH, and hypertension. Most of these factors have all been previously shown to be closely associated with the development of DPN. Numerous studies indicate that patients with a longer duration of T2DM are more likely to develop DPN (21, 22). Persistent hyperglycemia has been found to induce the formation of advanced glycation end products (AGEs), which bind to proteins in nerve cells and vascular endothelial cells, causing cellular damage, vascular injury, and neural dysfunction (23, 24). Over time, the accumulation of AGEs may lead to chronic inflammation and oxidative stress in neural tissues, further damaging peripheral nerves. HbA1c is widely recognized as the optimal biochemical marker for assessing long-term metabolic control in diabetes patients. Numerous prospective studies have confirmed a strong association between HbA1c and diabetic complications, a conclusion consistent with our findings (25–27). Our study also confirmed that uric acid and SCR are significant risk factors for DPN. Since peripheral nerves and renal vasculature are both exposed to the diabetic environment, it is generally believed that the development and progression of diabetic nephropathy (DN) and DPN occur concurrently (28, 29). A model was developed to predict DN, analyzing the risk factors for microvascular complications in T2DM patients, and it was found that DPN is closely related to DN (30). Further studies observed a significant correlation between different stages of DN and neuropathy in type 2 diabetes patients (31).

Additionally, our findings demonstrate that NLR is an important risk factor for DPN, consistent with previous studies (32). High NLR values are often strongly associated with endothelial dysfunction (33), which can lead to insufficient blood supply to peripheral nerves. Enzymes and reactive oxygen species released by neutrophils can damage vascular endothelial cells, exacerbating microcirculatory disorders (34). Moreover, elevated NLR levels reflect increased levels of pro-inflammatory cytokines, such as TNF-α and IL-1β. These cytokines exhibit direct neurotoxicity, accelerating neuronal apoptosis and functional loss (35).

In this study, we developed and compared the performance of nine ML models, including LR, RF, SVM, DT, KNN, NB, SGBT, NNET, and XGB. Among these, XGB and SGBT demonstrated the highest discriminative ability on the training set, with AUCs of 0.939 (95% CI: 0.928–0.950) and 0.933 (95% CI: 0.921–0.946), respectively. On the test set, the AUC of SGBT was 0.811 (95% CI: 0.776–0.843), slightly outperforming XGB at 0.810 (95% CI: 0.777–0.843). Although the performance difference between these models on the test set was marginal, SGBT exhibited greater stability and generalization capability, making it the optimal choice for this study. The superior generalization performance of SGBT is particularly valuable in ensuring robustness and reliability when applied to real-world scenarios, where data distributions can be complex and variable.

Although the SGBT model demonstrated the best performance among the nine ML models, the performance gap between the training set (AUC: 0.933) and the testing set (AUC: 0.811) suggests a potential risk of overfitting. This discrepancy may be attributed to model complexity, limited sample size, or inherent data heterogeneity. In light of the data imbalance in this study, one potential approach to mitigate this overfitting could be applying techniques such as SMOTE to balance the dataset. SMOTE has been shown to improve model generalization by synthesizing new samples from the minority class, which could lead to a more balanced training process and potentially reduce overfitting. However, we did not apply SMOTE in this study to avoid altering the real-world distribution of DPN cases, but it remains a strategy for future exploration. Additionally, expanding the sample size and incorporating external validation cohorts would be beneficial in further assessing the model’s robustness.

To further enhance the interpretability of the selected model, we utilized SHAP, a method designed to address the “black-box” nature of ML algorithms (36, 37). SHAP effectively elucidated the contributions of individual features, highlighting key clinical indicators such as diabetes duration, HbA1c, SCR, hypertension, and uric acid. These features, widely recognized as critical factors in the development of DPN, were shown to play a significant role in the ML framework used for risk prediction. Among them, longer diabetes duration and elevated HbA1c levels emerged as the most influential predictors. These findings are consistent with previous studies (38–40), which have demonstrated that chronic hyperglycemia and prolonged disease duration contribute to the development of diabetic neuropathy by promoting oxidative stress, inflammation, and microvascular damage. This alignment with clinical evidence enhances the model’s interpretability and supports its clinical applicability.

Compared with existing DPN prediction models, our SGBT model demonstrated competitive performance. For example, one prior study utilizing logistic regression reported an AUC of 0.759 (41), whereas our model achieved a higher AUC of 0.811 on the independent test set, indicating better predictive capability and generalizability. On the other hand, a recent study by Jiang et al. (42) reported a higher AUC of 0.900. Their model integrated both conventional clinical variables and unique indicators derived from traditional Chinese medicine, potentially capturing more comprehensive biological signals and improving overall predictive accuracy. In contrast, our model was developed solely using routine clinical and biochemical variables, which enhances its practicality, reproducibility, and ease of implementation in standard healthcare settings. Furthermore, our model features SHAP-based interpretability and a web-based risk calculator to facilitate clinical translation.

Despite the achievements of this study, there are some limitations to acknowledge. First, as a retrospective study, issues such as missing data and selection bias may affect the reliability of the results. Second, the cohort primarily consisted of urban patients receiving standardized treatment protocols, which may not fully represent rural populations or institutions with varying healthcare delivery practices. Third, the relatively small sample size and lack of adjustment for key covariates, such as lifestyle factors and medication regimen, may limit the generalizability of the findings to the broader population. To address these limitations, future research should adopt a prospective, multicenter design with larger and more diverse cohorts, ensuring enhanced representativeness and external validity. Additionally, exploring novel biomarkers and dynamic risk factors, along with incorporating real-time monitoring data, may enrich the model’s predictive capabilities. Such advancements would facilitate its clinical application, ultimately enabling more precise diagnosis and personalized management strategies for DPN patients.

5 Conclusion

This study developed an effective predictive tool for DPN using the SGBT model. By identifying and analyzing key predictive factors, it establishes a solid scientific foundation for the early detection and prevention of DPN, aiming to reduce complications and improve patients’ quality of life. Future efforts should focus on further validating the model’s performance, enhancing its accuracy and practical applicability, and integrating it into clinical workflows to advance personalized management and improve outcomes for DPN patients.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving humans were approved by Ethics Committee of the First Affiliated Hospital of Shandong First Medical University (Hospital Ethics Review No. S654). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

MS: Formal analysis, Writing – original draft, Software, Project administration, Methodology, Investigation. XS: Writing – original draft, Data curation, Conceptualization. FW: Writing – review & editing, Methodology, Conceptualization, Investigation. LL: Validation, Formal analysis, Software, Writing – review & editing, Investigation.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1614657/full#supplementary-material

References

1. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. (2022) 183:109119. doi: 10.1016/j.diabres.2021.109119

PubMed Abstract | Crossref Full Text | Google Scholar

2. Lin YK, Gao B, Liu L, Ang L, Mizokami-Stout K, Pop-Busui R, et al. The prevalence of diabetic microvascular complications in China and the USA. Curr Diabetes Rep. (2021) 21:16. doi: 10.1007/s11892-021-01387-3

PubMed Abstract | Crossref Full Text | Google Scholar

3. Dal Canto E, Ceriello A, Ryden L, Ferrini M, Hansen TB, Schnell O, et al. Diabetes as a cardiovascular risk factor: An overview of global trends of macro and micro vascular complications. Eur J Prev Cardiol. (2019) 26:25–32. doi: 10.1177/2047487319878371

PubMed Abstract | Crossref Full Text | Google Scholar

4. Liu L, Bi B, Gui M, Zhang L, Ju F, Wang X, et al. Development and internal validation of an interpretable risk prediction model for diabetic peripheral neuropathy in type 2 diabetes: a single-centre retrospective cohort study in China. BMJ Open. (2025) 15:e092463. doi: 10.1136/bmjopen-2024-092463

PubMed Abstract | Crossref Full Text | Google Scholar

5. Ziegler D, Papanas N, Schnell O, Nguyen BDT, Nguyen KT, Kulkantrakorn K, et al. Current concepts in the management of diabetic polyneuropathy. J Diabetes Investig. (2021) 12:464–75. doi: 10.1111/jdi.13401

PubMed Abstract | Crossref Full Text | Google Scholar

6. Hicks CW and Selvin E. Epidemiology of peripheral neuropathy and lower extremity disease in diabetes. Curr Diabetes Rep. (2019) 19:86. doi: 10.1007/s11892-019-1212-8

PubMed Abstract | Crossref Full Text | Google Scholar

7. Coppini DV, Bowtell PA, Weng C, Young PJ, and Sonksen PH. Showing neuropathy is related to increased mortality in diabetic patients - a survival analysis using an accelerated failure time model. J Clin Epidemiol. (2000) 53:519–23. doi: 10.1016/S0895-4356(99)00170-5

PubMed Abstract | Crossref Full Text | Google Scholar

8. Hicks CW, Wang D, Matsushita K, Windham BG, and Selvin E. Peripheral neuropathy and all-cause and cardiovascular mortality in U.S. Adults: A prospective cohort study. Ann Intern Med. (2021) 174:167–74. doi: 10.7326/M20-1340

PubMed Abstract | Crossref Full Text | Google Scholar

9. Hou F, Zhu Y, Zhao H, Cai H, Wang Y, Peng X, et al. Development and validation of an interpretable machine learning model for predicting the risk of distant metastasis in papillary thyroid cancer: a multicenter study. EClinicalMedicine. (2024) 77:102913. doi: 10.1016/j.eclinm.2024.102913

PubMed Abstract | Crossref Full Text | Google Scholar

10. Hu J, Xu J, Li M, Jiang Z, Mao J, Feng L, et al. Identification and validation of an explainable prediction model of acute kidney injury with prognostic implications in critically ill children: a prospective multicenter cohort study. EClinicalMedicine. (2024) 68:102409. doi: 10.1016/j.eclinm.2023.102409

PubMed Abstract | Crossref Full Text | Google Scholar

11. Huang H, Lin Q, Dai X, Chen J, Bai Z, Li X, et al. Derivation and validation of urinary TIMP-1 for the prediction of acute kidney injury and mortality in critically ill children. J Transl Med. (2022) 20:102. doi: 10.1186/s12967-022-03302-0

PubMed Abstract | Crossref Full Text | Google Scholar

12. Yang J and Jiang S. Development and validation of a model that predicts the risk of diabetic nephropathy in type 2 diabetes mellitus patients: A cross-sectional study. Int J Gen Med. (2022) 15:5089–101. doi: 10.2147/IJGM.S363474

PubMed Abstract | Crossref Full Text | Google Scholar

13. Shi R, Niu Z, Wu B, Zhang T, Cai D, Sun H, et al. Nomogram for the risk of diabetic nephropathy or diabetic retinopathy among patients with type 2 diabetes mellitus based on questionnaire and biochemical indicators: A cross-sectional study. Diabetes Metab Syndr Obes. (2020) 13:1215–29. doi: 10.2147/DMSO.S244061

PubMed Abstract | Crossref Full Text | Google Scholar

14. Tesfaye S, Boulton AJ, Dyck PJ, Freeman R, Horowitz M, Kempler P, et al. Diabetic neuropathies: update on definitions, diagnostic criteria, estimation of severity, and treatments. Diabetes Care. (2010) 33:2285–93. doi: 10.2337/dc10-1303

PubMed Abstract | Crossref Full Text | Google Scholar

15. Lian X, Qi J, Yuan M, Li X, Wang M, Li G, et al. Study on risk factors of diabetic peripheral neuropathy and establishment of a prediction model by machine learning. BMC Med Inform Decis Mak. (2023) 23:146. doi: 10.1186/s12911-023-02232-1

PubMed Abstract | Crossref Full Text | Google Scholar

16. Sun T, Liu J, Yuan H, Li X, and Yan H. Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm. Front Oncol. (2024) 14:1403392. doi: 10.3389/fonc.2024.1403392

PubMed Abstract | Crossref Full Text | Google Scholar

17. Frost HR and Amos CI. Gene set selection via LASSO penalized regression (SLPR). Nucleic Acids Res. (2017) 45:e114. doi: 10.1093/nar/gkx291

PubMed Abstract | Crossref Full Text | Google Scholar

18. Sun T, Yue X, Zhang G, Lin Q, Chen X, Huang T, et al. AKIML(pred): An interpretable machine learning model for predicting acute kidney injury within seven days in critically ill patients based on a prospective cohort study. Clin Chim Acta. (2024) 559:119705. doi: 10.1016/j.cca.2024.119705

PubMed Abstract | Crossref Full Text | Google Scholar

19. Sloan G, Selvarajah D, and Tesfaye S. Pathogenesis, diagnosis and clinical management of diabetic sensorimotor peripheral neuropathy. Nat Rev Endocrinol. (2021) 17:400–20. doi: 10.1038/s41574-021-00496-z

PubMed Abstract | Crossref Full Text | Google Scholar

20. Elafros MA, Andersen H, Bennett DL, Savelieff MG, Viswanathan V, Callaghan BC, et al. Towards prevention of diabetic peripheral neuropathy: clinical presentation, pathogenesis, and new treatments. Lancet Neurol. (2022) 21:922–36. doi: 10.1016/S1474-4422(22)00188-0

PubMed Abstract | Crossref Full Text | Google Scholar

21. Wu B, Niu Z, and Hu F. Study on risk factors of peripheral neuropathy in type 2 diabetes mellitus and establishment of prediction model. Diabetes Metab J. (2021) 45:526–38. doi: 10.4093/dmj.2020.0100

PubMed Abstract | Crossref Full Text | Google Scholar

22. Khawaja N, Abu-Shennar J, Saleh M, Dahbour SS, Khader YS, and Ajlouni KM. The prevalence and risk factors of peripheral neuropathy among patients with type 2 diabetes mellitus; the case of Jordan. Diabetol Metab Syndr. (2018) 10:8. doi: 10.1186/s13098-018-0309-6

PubMed Abstract | Crossref Full Text | Google Scholar

23. Jagdale AD, Angal MM, Patil RS, and Tupe RS. Exploring the glycation association with dyslipidaemia: Novel approach for diabetic nephropathy. Biochem Pharmacol. (2024) 229:116513. doi: 10.1016/j.bcp.2024.116513

PubMed Abstract | Crossref Full Text | Google Scholar

24. Bangar NS, Dixit A, Apte MM, and Tupe RS. Syzygium cumini (L.) skeels mitigate diabetic nephropathy by regulating Nrf2 pathway and mitocyhondrial dysfunction: In vitro and in vivo studies. J Ethnopharmacol. (2025) 336:118684. doi: 10.1016/j.jep.2024.118684

PubMed Abstract | Crossref Full Text | Google Scholar

25. Zhang X, Gregg EW, Williamson DF, Barker LE, Thomas W, Bullard KM, et al. A1C level and future risk of diabetes: a systematic review. Diabetes Care. (2010) 33:1665–73. doi: 10.2337/dc09-1939

PubMed Abstract | Crossref Full Text | Google Scholar

26. Sabanayagam C, Liew G, Tai ES, Shankar A, Lim SC, Subramaniam T, et al. Relationship between glycated haemoglobin and microvascular complications: is there a natural cut-off point for the diagnosis of diabetes? Diabetologia. (2009) 52:1279–89. doi: 10.1007/s00125-009-1360-5

PubMed Abstract | Crossref Full Text | Google Scholar

27. Tabak AG, Brunner EJ, Lindbohm JV, Singh-Manoux A, Shipley MJ, Sattar N, et al. Risk of macrovascular and microvascular disease in diabetes diagnosed using oral glucose tolerance test with and without confirmation by hemoglobin A1c: the whitehall II cohort study. Circulation. (2022) 146:995–1005. doi: 10.1161/CIRCULATIONAHA.122.059430

PubMed Abstract | Crossref Full Text | Google Scholar

28. Liu X, Chen D, Fu H, Liu X, Zhang Q, Zhang J, et al. Development and validation of a risk prediction model for early diabetic peripheral neuropathy based on a systematic review and meta-analysis. Front Public Health. (2023) 11:1128069. doi: 10.3389/fpubh.2023.1128069

PubMed Abstract | Crossref Full Text | Google Scholar

29. Wang W, Ji Q, Ran X, Li C, Kuang H, Yu X, et al. Prevalence and risk factors of diabetic peripheral neuropathy: A population-based cross-sectional study in China. Diabetes Metab Res Rev. (2023) 39:e3702. doi: 10.1002/dmrr.3702

PubMed Abstract | Crossref Full Text | Google Scholar

30. Teliti M, Cogni G, Sacchi L, Dagliati A, Marini S, Tibollo V, et al. Risk factors for the development of micro-vascular complications of type 2 diabetes in a single-centre cohort of patients. Diabetes Vasc Dis Res. (2018) 15:424–32. doi: 10.1177/1479164118780808

PubMed Abstract | Crossref Full Text | Google Scholar

31. Xu L, Lin X, Guan M, and Liu Y. Correlation between different stages of diabetic nephropathy and neuropathy in patients with T2DM: A cross-sectional controlled study. Diabetes Ther. (2018) 9:2335–46. doi: 10.1007/s13300-018-0519-9

PubMed Abstract | Crossref Full Text | Google Scholar

32. Allwright M, Karrasch JF, O’Brien JA, Guennewig B, and Austin PJ. Machine learning analysis of the UK Biobank reveals prognostic and diagnostic immune biomarkers for polyneuropathy and neuropathic pain in diabetes. Diabetes Res Clin Pract. (2023) 201:110725. doi: 10.1016/j.diabres.2023.110725

PubMed Abstract | Crossref Full Text | Google Scholar

33. Popescu AI, Rata AL, Barac S, Popescu R, Onofrei RR, Vlad C, et al. Narrative review of biological markers in chronic limb-threatening ischemia. Biomedicines. (2024) 12. doi: 10.3390/biomedicines12040798

PubMed Abstract | Crossref Full Text | Google Scholar

34. Yang T, Xiang CG, Wang XH, Li QQ, Lei SY, Zhang KR, et al. RIPK1 inhibitor ameliorates pulmonary injury by modulating the function of neutrophils and vascular endothelial cells. Cell Death Discov. (2024) 10:152. doi: 10.1038/s41420-024-01921-8

PubMed Abstract | Crossref Full Text | Google Scholar

35. Xu JW, Xu X, Ling Y, Wang YC, Huang YJ, Yang JZ, et al. Vincamine as an agonist of G-protein-coupled receptor 40 effectively ameliorates diabetic peripheral neuropathy in mice. Acta Pharmacol Sin. (2023) 44:2388–403. doi: 10.1038/s41401-023-01135-1

PubMed Abstract | Crossref Full Text | Google Scholar

36. The Lancet Respiratory M. Opening the black box of machine learning. Lancet Respir Med. (2018) 6:801. doi: 10.1016/S2213-2600(18)30425-9

PubMed Abstract | Crossref Full Text | Google Scholar

37. Bernard D, Doumard E, Ader I, Kemoun P, Pages JC, Galinier A, et al. Explainable machine learning framework to predict personalized physiological aging. Aging Cell. (2023) 22:e13872. doi: 10.1111/acel.13872

PubMed Abstract | Crossref Full Text | Google Scholar

38. Bjerg L, Hulman A, Carstensen B, Charles M, Witte DR, and Jorgensen ME. Effect of duration and burden of microvascular complications on mortality rate in type 1 diabetes: an observational clinical cohort study. Diabetologia. (2019) 62:633–43. doi: 10.1007/s00125-019-4812-6

PubMed Abstract | Crossref Full Text | Google Scholar

39. Park HS, Cho Y, Seo DH, Ahn SH, Hong S, Suh YJ, et al. Impact of diabetes distress on glycemic control and diabetic complications in type 2 diabetes mellitus. Sci Rep. (2024) 14:5568. doi: 10.1038/s41598-024-55901-0

PubMed Abstract | Crossref Full Text | Google Scholar

40. Schorr SG, Hammes HP, Muller UA, Abholz HH, Landgraf R, and Bertram B. The prevention and treatment of retinal complications in diabetes. Dtsch Arztebl Int. (2016) 113:816–23. doi: 10.3238/arztebl.2016.0816

PubMed Abstract | Crossref Full Text | Google Scholar

41. Zhang W and Chen L. A nomogram for predicting the possibility of peripheral neuropathy in patients with type 2 diabetes mellitus. Brain Sci. (2022) 12. doi: 10.3390/brainsci12101328

PubMed Abstract | Crossref Full Text | Google Scholar

42. Jiang A, Li J, Wang L, Zha W, Lin Y, Zhao J, et al. Multi-feature, Chinese-Western medicine-integrated prediction model for diabetic peripheral neuropathy based on machine learning and SHAP. Diabetes Metab Res Rev. (2024) 40:e3801. doi: 10.1002/dmrr.v40.4

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: diabetic peripheral neuropathy, machine learning, interpretable, clinical data, risk prediction model

Citation: Sun M, Sun X, Wang F and Liu L (2025) Machine learning-based prediction of diabetic peripheral neuropathy: model development and clinical validation. Front. Endocrinol. 16:1614657. doi: 10.3389/fendo.2025.1614657

Received: 19 April 2025; Accepted: 20 May 2025;
Published: 05 June 2025.

Edited by:

Thorsten Siegmund, Isar Clinic, Germany

Reviewed by:

Lianhua Liu, Second Affiliated Hospital of Hainan Medical University, China
Xiaoyang Lian, Jiangsu Provincial Hospital of Traditional Chinese Medicine, China

Copyright © 2025 Sun, Sun, Wang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Fei Wang, Mjc4MDkzOTIxQHFxLmNvbQ==; Li Liu, bmV1cm9ucHVAMTYzLmNvbQ==

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.