Development and validation of an explainable machine learning model for predicting osteoporosis in patients with type 2 diabetes mellitus

Wei, Qipeng; Liu, Zihao; Chen, Xiaofeng; Li, Hao; Guo, Weijun; Huang, Qingyan; Zhan, Jinxiang; Chen, Shiji; Cai, Dongling

doi:10.3389/fendo.2025.1611499

ORIGINAL RESEARCH article

Front. Endocrinol., 07 August 2025

Sec. Bone Research

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1611499

Development and validation of an explainable machine learning model for predicting osteoporosis in patients with type 2 diabetes mellitus

Qipeng Wei¹

Zihao Liu²

Xiaofeng Chen¹

Hao Li¹

Weijun Guo¹

Qingyan Huang¹

Jinxiang Zhan¹

Shiji Chen²

Dongling Cai^1,2*

¹Department of Orthopedics, Panyu Hospital of Chinese Medicine, Guangzhou, China
²Panyu Hospital of Chinese Medicine, Guangzhou University of Chinese Medicine, Guangzhou, China

Objective: Osteoporosis is a common complication in patients with type 2 diabetes mellitus (T2DM), yet its screening rate remains low. This study aimed to develop and validate a cost-effective and interpretable machine learning (ML) model to predict the risk of osteoporosis in patients with T2DM.

Methods: This retrospective study included 1560 inpatients who underwent dual-energy X-ray absorptiometry (DXA) between January 2022 and December 2023 at Panyu Hospital of Chinese Medicine. Demographic information and laboratory test results obtained within 24 hours of hospital admission were collected. Potential predictive features were identified using univariate analysis, least absolute shrinkage and selection operator (LASSO) regression, and the Boruta algorithm. Eight supervised ML algorithms were applied to construct predictive models. Model performance was evaluated based on the area under the receiver operating characteristic curve (AUC), calibration plots, decision curve analysis (DCA), accuracy, sensitivity, specificity, and F1 score. The SHapley Additive exPlanations (SHAP) method was used to interpret the model and visualize feature importance.

Results: Ten predictive features were selected based on the intersection of the three feature selection methods. Among the tested models, logistic regression achieved the best overall performance, with an AUC of 0.812, an accuracy of 0.762, a sensitivity of 0.809, a specificity of 0.761, and an F1 score of 0.771 in the validation set. Calibration plots and DCA curves demonstrated good agreement and the highest net clinical benefit. SHAP analysis identified age, sex, alkaline phosphatase, uric acid, hemoglobin, and neutrophil count as the six most influential features. An easy-to-use, web-based risk calculator was developed based on the logistic model and is available at: https://t2dm.shinyapps.io/t2dm-osteoporosis/.

Conclusion: We developed an interpretable and accessible ML-based online tool that enables preliminary screening of osteoporosis risk in patients with T2DM using routine blood indicators. This tool may assist clinicians in early risk identification and reduce the underdiagnosis of osteoporosis.

Introduction

Diabetes mellitus has emerged as a major global public health concern with significant impacts on morbidity and mortality. According to recent estimates, approximately 537 million people are currently living with diabetes worldwide. In China, the prevalence has reached 12.8%, affecting around 140 million individuals, with type 2 diabetes mellitus (T2DM) accounting for 90–95% of all cases (1).

Osteoporosis (OP), a chronic skeletal disorder characterized by trabecular deterioration, disrupted bone microarchitecture, decreased bone mass per unit volume, and increased bone fragility and fracture risk, represents one of the most common complications of diabetes (2). In patients with diabetes, hormonal imbalances and metabolic disturbances contribute to a range of complications. Persistent hyperglycemia accelerates calcium loss, disrupts bone metabolism, and leads to diabetic osteoporosis. Diabetic osteoporosis substantially increases the risk of falls and fractures, which in turn results in reduced quality of life and heightened mortality. In China, the prevalence of osteoporosis among individuals with T2DM is estimated at 37.8%, highlighting a critical yet frequently underrecognized public health issue (3). Accurate assessment of osteoporosis risk in patients with T2DM is therefore essential. However, a standardized and widely accepted risk assessment tool is currently lacking.

In recent years, machine learning (ML) has emerged as a powerful tool in medical diagnostics due to its ability to handle complex, high-dimensional data and uncover non-linear relationships between predictors and outcomes (4). In the field of metabolic bone disease, ML has shown promise in enhancing osteoporosis risk stratification by integrating diverse clinical and biochemical variables (5). A systematic review by Sadat-ali evaluated the performance of AI models in predicting osteoporotic fractures and illustrated that AI is a promising tool and that it may outperform conventional detection methods (6).However, many existing models require imaging inputs or are limited by interpretability. Therefore, applying interpretable ML approaches to routine clinical data offers a cost-effective and scalable solution to identify high-risk individuals, particularly in populations such as patients with T2DM, who are often under-screened.

In this study, we developed and validated an ML-based model to predict the risk of osteoporosis in patients with T2DM. The most effective predictive algorithm was identified through model comparison, and a user-friendly web-based tool was constructed to facilitate clinical application and personalized risk assessment.

Methods

Study design

This single-center, retrospective study consecutively enrolled hospitalized patients with type 2 diabetes mellitus (T2DM) who underwent dual-energy X-ray absorptiometry (DXA) at Panyu Hospital of Chinese Medicine between January 2022 and December 2023. Patients who underwent DXA examination during hospitalization were eligible for inclusion. The inclusion criteria were as follows: (1) age ≥ 45 years; (2) completed standardized DXA assessment; and (3) availability of complete electronic medical records, including demographic information and routine laboratory test results. Exclusion criteria included: (1) secondary osteoporosis; (2) hematological disorders; (3) history of malignancy; (4) severe hepatic or renal insufficiency; (5) acute infectious disease; and (6) incomplete clinical data.Bone mineral density (BMD) was measured using a standardized DXA protocol, assessing the lumbar spine (L1–L4), left femoral neck, and total hip. According to the diagnostic criteria defined by the World Health Organization, participants were classified into the osteoporosis group (T-score ≤ –2.5 SD) and the non-osteoporosis group (T-score > –2.5 SD) (Figure 1).

Figure 1

Flowchart depicting a study on osteoporosis risk factors in patients with type 2 diabetes. It shows variables like demographics, blood tests, and a composite index. A final cohort of 1,560 is split into training (1,093) and test sets (467), with feature selection via LASSO and Boruta. Models tested include LR, SVM, NN, GBM, KNN, CatBoost, XGBoost, and LightGBM. Model performance metrics include ROC curve, accuracy, and more. SHAP method is used for model interpretability, generating summary, significance, dependency, and scatter plots.

Figure 1. Study flow chart. LR, logistic regression; SVM, support vector machine; GBM, gradient boosting machine; NN, neural network; XGBoost, extreme gradient boosting; KNN, k-nearest neighbors; LightGBM, light gradient boosting machine; AdaBoost, adaptive boosting; ROC, receiver operating characteristic curves; DCA, decision curve analysis; SHAP, Shapley additive explanations.

The study protocol adhered to the Declaration of Helsinki and was approved by the Institutional Review Board of Panyu Hospital of Guangzhou University of Chinese Medicine. Given the retrospective nature of this study, informed consent was waived.

Data collection and preprocessing

Baseline demographic characteristics (age and sex) and standardized laboratory parameters were collected. Venous blood samples were collected from all fasting participants within 24 hours of hospital admission. Complete blood counts, including hemoglobin (HGB), neutrophil count (NEUT), red blood cell count (RBC), platelet count (PLT), lymphocyte count (LYMPH), and monocyte count (MONO), were analyzed using the Mindray BC-6800Plus hematology analyzer. Biochemical analyses included total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), alkaline phosphatase (ALP), alanine aminotransferase (ALT), albumin (ALB), uric acid (UA), fasting blood glucose (FBG), serum calcium (Ca), serum phosphate (Pi), and creatinine (Cr). All laboratory evaluations and DXA measurements were performed during the same hospitalization period.

Before model development, several preprocessing steps were performed to ensure data quality and consistency. Duplicate records and entries with apparent input errors were excluded. For variables with missing values less than 20%, imputation was performed using the mean for continuous variables and the mode for categorical variables. Variables with more than 20% missing data were excluded from model construction.

Calculation of derived biomarkers

To capture complex interactions related to metabolic and inflammatory status, we derived seven composite indices from the original laboratory values: Monocyte-to-HDL ratio (MHR), Neutrophil-to-HDL ratio (NHR), Platelet-to-HDL ratio (PHR), Lymphocyte-to-HDL ratio (LHR), Triglyceride-Glucose index (TyG), Cholesterol-Glucose index (CHG), and Non-HDL-to-Neutrophil ratio (NHHR). These indices have been previously associated with chronic inflammation, insulin resistance, and cardiovascular risk—factors that are also implicated in osteoporosis, particularly among patients with type 2 diabetes mellitus.Seven novel indices were derived from original laboratory measurements using the following formulas, as previously described in the literature (7–13):

Monocyte - to - HDL Ratio (MHR) : MONO (\times 10^{9} / L) / HDL - C (mmol / L)

Neutrophil - to - HDL Ratio (NHR) : NEUT (\times 10^{9} / L) / HDL - C (mmol / L)

Platelet - to - HDL Ratio (PHR) : PLT (\times 10^{9} / L) / HDL - C (mmol / L)

Lymphocyte - to - HDL Ratio (LHR) : LYMPH (\times 10^{9} / L) / HDL - C (mmol / L)

Triglyceride - Glucose Index (TyG) : ln [TG (mg / dL) \times FPG (mg / dL) / 2]

Non - HDL - to - Neutrophil Ratio (NHHR) : [TC (mg / dL) - HDL - C (mg / dL)] / HDL - C (\times 10^{9} / L)

Cholesterol - Glucose Index (CHG) : ln [TC (mg / dL) \times FPG (mg / dL) / (2 \times HDL - C (mg / dL))]

Feature selection and model construction

In this study, the caret package in R was used to randomly partition the dataset into a training set (70%) and a testing set (30%). The training set was utilized for model development, while the testing set was reserved for performance evaluation.

To identify potential predictors from baseline variables in the training set, three independent feature selection methods were employed: univariate analysis, least absolute shrinkage and selection operator (LASSO) regression, and the Boruta algorithm (14). Univariate analysis, a conventional statistical method, selected variables with a P-value < 0.05. LASSO regression identified predictors with non-zero coefficients, effectively addressing multicollinearity and reducing the risk of model overfitting due to excessive inter-variable correlation (15). In this study, LASSO regression with 10-fold cross-validation was performed to screen variables from high-dimensional data.The Boruta algorithm, a wrapper method based on feature importance, identifies relevant variables by comparing the Z-scores of actual features to those of permuted “shadow” features. In each iteration, Z-scores for the real features were computed using a random forest (RF) classifier, while those for the shadow features were obtained by randomly shuffling the original variables (16). Features consistently performing worse than the shadow features were iteratively removed.To ensure that there was no multicollinearity among the candidate variables, we calculated the variance inflation factor (VIF) scores. Variables with high collinearity (VIF > 10) were excluded to maintain model stability.

The intersection of variables selected by all three methods was used to develop prediction models. Ten supervised machine learning algorithms were implemented for model construction: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), neural network (NN), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), AdaBoost, and LightGBM.

Model evaluation

The optimal model was determined through a comprehensive evaluation of the discriminative ability, calibration performance, and clinical applicability of all ten candidate models. The receiver operating characteristic (ROC) curve was constructed to visually represent the model’s discriminative power, with the area under the curve (AUC) serving as the primary quantitative metric. Additional performance indicators, including accuracy, sensitivity, specificity, precision, and F1 score, were also calculated to supplement the evaluation. Calibration and clinical utility were further assessed using decision curve analysis (DCA) and calibration plots, respectively.

Model interpretation

To enhance model interpretability, Shapley Additive Explanations (SHAP) values were analyzed to quantify the contribution and importance of each feature in determining the final classification outcome. A higher SHAP value indicates greater influence on the model’s output prediction. We present a feature importance analysis based on SHAP values to interpret the results of the optimal model (17).

Statistical analysis

All statistical analyses were conducted using R software (version 4.4.3). Continuous variables with a normal distribution were presented as mean ± standard deviation (SD), while non-normally distributed continuous variables were expressed as median with interquartile range (IQR). Categorical variables were summarized as frequencies and percentages. Group differences in continuous variables were assessed using either the independent samples t-test or the Kruskal–Wallis test, as appropriate. Categorical variables were compared using the chi-square test or Fisher’s exact test. A two-sided p-value < 0.05 was considered statistically significant.

Lasso regression and Boruta feature selection were performed using the “glmnet” and “Boruta” R packages, respectively. Predictive model development and training were conducted with the “caret” package, with default hyperparameter tuning implemented via grid search. SHAP value analysis for model interpretation was carried out using the “shapviz” package.

Results

Comparison of clinical characteristics

A total of 1,560 patients were included in this study and randomly assigned to the training set (n = 1,093) and the testing set (n = 467) in a 7:3 ratio. No statistically significant differences were observed between the two sets across all clinical variables (P > 0.05), as detailed in Table 1.

Table 1

Table 1. Baseline characteristics of patients.

In the training cohort, patients with and without osteoporosis showed statistically significant differences in several clinical and biochemical variables, as detailed in Table 2. These included age, sex, hemoglobin (HGB), neutrophil count (NEUT), lymphocyte count (LYMPH), red blood cell count (RBC), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), alkaline phosphatase (ALP), alanine aminotransferase (ALT), albumin (ALB), uric acid (UA), serum calcium (Ca), as well as derived indices such as neutrophil-to-HDL ratio (NHR), platelet-to-HDL ratio (PHR), lymphocyte-to-HDL ratio (LHR), triglyceride-glucose index (TyG), and non-HDL-to-neutrophil ratio (NHHR) (all P < 0.05). Detailed comparisons are provided in Table 2. Conversely, no statistically significant differences were found in platelet count (PLT), monocyte count (MONO), total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), aspartate aminotransferase (AST), creatinine (Cr), serum phosphate (Pi), fasting blood glucose (FBG), monocyte-to-HDL ratio (MHR), and cholesterol-glucose index (CHG) between the two groups (all P > 0.05).

Table 2

Table 2. Results of univariate analysis.

Model development

Eighteen variables identified through univariate analysis were subjected to further feature selection using LASSO regression and the Boruta algorithm. In the LASSO regression, the optimal lambda value was determined to be 0.024, which yielded 10 key predictive features. These included two demographic variables (age and sex) and eight laboratory indicators (ALP, UA, HGB, NEUT, ALT, LHR, Ca, and TG). Notably, the results of the Boruta algorithm fully corroborated those of LASSO regression, confirming the same 10 features (Figure 2). The candidate variables also showed VIF values below the accepted threshold of 10, indicating no significant Multicollinearity.These variables were ultimately selected as the core predictors for model construction.

Figure 2

Panel A shows a line plot of coefficients against log Lambda with multiple colored lines indicating different coefficients. Panel B is a plot of binomial deviance versus log Lambda, with red dots and error bars, showing increasing deviance. Panel C presents a box plot of feature importance for various attributes, with colored boxes indicating different features, including age and gender.

Figure 2. Features selected by univariate analysis, Lasso and Boruta. (A) The Lasso regression coefficient profiles of characteristics. (B) The optimal lambda selection in the Lasso regression with 10-fold cross-validation. (C) Variables selected by Boruta algorithm.

Model evaluation

Eight machine learning algorithms were employed to develop models for predicting osteoporosis in patients with T2DM: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), neural network (NN), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), AdaBoost, and LightGBM. Among these, the LR model demonstrated the most robust and consistent performance in the testing set (Table 3).

Table 3

Table 3. The prediction performance of each model.

As illustrated in Figure 3, the LR model achieved the highest discriminative performance, with an AUC of 0.812 (95% CI: 0.750–0.832). It also outperformed other models in terms of accuracy (0.762), sensitivity (0.809), and F1 score (0.771), while maintaining favorable specificity (0.761) and precision (0.778). In comparison, KNN (AUC = 0.734, 95% CI: 0.689–0.779), AdaBoost (AUC = 0.717, 95% CI: 0.674–0.761), and LightGBM (AUC = 0.723, 95% CI: 0.676–0.770) demonstrated inferior classification performance. Although the SVM model achieved slightly higher specificity (0.808) and precision (0.801), its lower sensitivity (0.650), F1 score (0.717), and AUC (0.773, 95% CI: 0.731–0.815) indicated limited generalizability. Moreover, the GBM, NN, and XGBoost models exhibited noticeably lower AUCs, accuracy, sensitivity, and F1 scores compared to the LR model, further supporting the superior generalization ability of the latter.

Figure 3

Six-panel image showing A and D: ROC curves for various models like Logistic, SVM, and GBM, indicating sensitivity versus 1-specificity with AUC values. B and E: Decision curves displaying standardized net benefit against high-risk thresholds. C and F: Calibration plots illustrating observed event percentages against bin midpoints for different models. Each panel has distinct model performance evaluations with relevant metrics and legends.

Figure 3. Machine learning based prediction model for osteoporosis. (A) ROC curve of the training set of the machine learning-based model. (B) Training set DCA of the machine learning-based model. (C) Training set calibration curve of the machine learning based model. (D) Testing set ROC curve of machine learning based model. (E) Testing set DCA based on machine learning model. (F) Testing set calibration curve based on machine learning model. ROC, receiver operating characteristic; DCA, decision curve analysis.

Model interpretability and web application

To elucidate the decision-making process of the LR model, we applied the SHapley Additive exPlanations (SHAP) framework for both global and local interpretation. The bar chart in Figure 4A displays the mean absolute SHAP values for each predictor, reflecting their overall contribution to the model.The SHAP summary plot (Figure 4B) ranked the importance of predictive features, with age, sex, ALP, UA, HGB, NEUT, ALT, LHR, Ca, and TG identified as the most influential variables in descending order. Figures 4C, D provide further insights into how individual SHAP values influence predictions. Age emerged as the most dominant predictor, and interaction plots (Figures 5A–F) illustrated the complex interrelationships between age and other variables.

Figure 4

Four-panel image displaying SHAP values for a logistic model. Panel A shows a horizontal bar chart of mean SHAP values, indicating the importance of features like age and gender. Panel B presents a beeswarm plot illustrating SHAP values for each feature with a color gradient for feature values, highlighting age and gender. Panel C features a waterfall chart displaying contributions of features such as ALT and HGB to the prediction. Panel D shows a bar chart detailing individual contributions with gender and ALP having notable impacts.

Figure 4. SHAP plots. (A) Bar chart of the mean absolute SHAP value for each predictor of the Logistic model in descending order. (B) SHAP summary plot shows feature importance for each predictor of the Logistic model in descending order. The upper predictors are more important to the model’s predictive outcome. A dot is created for each feature attribution value for the Logistic model of each patient. The further away a dot is from the baseline SHAP value of zero, the stronger it effects the model output. Dots are colored according to the values of features. Yellow represents higher feature values and red represents lower feature values. (C, D) The force plots provide personalized feature attributions using representative examples. SHAP, Shapley additive explanations.

Figure 5

Six scatter plots labeled A to F, each titled “Logistic,” show SHAP value on the y-axis and Age on the x-axis. Each plot uses a color gradient scale to represent different variables: A uses “Gender,” B “ALP,” C “UA,” D “HGB,” E “NEUT,” and F “ALT.” Data points generally show an upward trend, with variations based on the colored variable in each plot.

Figure 5. Panels (A–F) display the SHAP dependency plots for features in the Logistic model, illustrating their relationships with Age.The Y-axis represents SHAP values, while the X-axis represents actual clinical parameters. Significantly, when a feature's SHAP value is greater than 0, it suggests an increased risk of osteoporosis, whereas a negative SHAP value suggests a reduced risk.

To enhance the model’s clinical utility and streamline its application in practice, we optimized the input requirements by limiting them to the 10 key features, many of which are routinely obtained in standard blood tests. Furthermore, we developed a user-friendly, web-based calculator (available at:https://t2dm.shinyapps.io/t2dm-osteoporosis/) to facilitate real-time risk assessment of osteoporosis in patients with T2DM. This platform enables clinicians to enter patient data and receive immediate, individualized risk estimations, thereby supporting early intervention and personalized care strategies.

Discussion

In this study, we developed and validated a machine learning model for predicting osteoporosis in patients with type 2 diabetes mellitus (T2DM), using demographic characteristics and routine blood test indicators. According to the latest IDF Diabetes Atlas, the global prevalence of diabetes has reached approximately 643 million people, meaning that one in nine adults is affected (18). In China alone, between 50% and 66% of adults with T2DM exhibit decreased bone mineral density (BMD), and nearly one-third can be diagnosed with osteoporosis (19, 20). Therefore, there is an urgent need to establish a reliable screening tool specifically targeting osteoporosis in T2DM patients.

The logistic regression model constructed in this study demonstrated strong and consistent predictive performance in both the training and testing cohorts, with area under the curve (AUC) values ranging from 0.812 to 0.835. The high AUC values highlight the model’s accuracy and robustness in predicting osteoporosis among T2DM patients. Moreover, calibration curves and decision curve analysis (DCA) further confirmed the model’s good calibration and substantial clinical net benefit.

To enhance the model’s interpretability and clinical applicability, we employed a multistep feature selection process. Initially, univariate analysis was used to identify potential predictors. LASSO regression was then applied to address multicollinearity and eliminate irrelevant variables. Finally, the Boruta algorithm, based on random forest importance scores, was used to confirm the stability of selected features (15, 21, 22). This process identified 10 core predictors: age, sex, ALP, UA, HGB, NEUT, ALT, LHR, Ca, and TG. By reducing the number of variables, the logistic regression model became more interpretable and user-friendly, facilitating easier data collection in clinical settings.

Although more complex algorithms such as KNN and LightGBM demonstrated high discriminative ability in the training set (AUCs of 0.932 and 0.912, respectively), their substantial performance decline in the test set (AUCs of 0.734 and 0.723) indicated overfitting and limited generalizability. Overfitting, a common concern in machine learning models with small or moderate-sized datasets, compromises the reliability of predictions when applied to new data (23). In contrast, logistic regression maintained more stable performance across both sets (AUC from 0.835 to 0.812) and achieved a strong F1 score, indicating a balanced trade-off between precision and recall. Considering the goal of developing an interpretable and clinically applicable tool, we prioritized robustness and interpretability over algorithmic complexity. Logistic regression provides transparency in feature contributions and aligns better with clinical decision-making workflows, supporting its adoption as the final model.

The pathogenesis of osteoporosis in T2DM is multifactorial, involving aging, sex, metabolic dysregulation, chronic inflammation, and impaired bone remodeling (2, 24–27). Aging disrupts bone homeostasis by impairing osteoblast function, and the abrupt decline in estrogen levels after menopause accelerates osteoclast activation (28, 29). Hyperglycemia-induced oxidative stress increases skeletal fragility (30). Elevated ALP levels reflect increased bone turnover, potentially related to compensatory bone formation following bone microarchitecture disruption by advanced glycation end products (AGEs) (31). Reduced serum calcium may be attributed to insulin resistance and vitamin D deficiency, both of which disturb calcium-phosphorus metabolism and promote bone resorption (32, 33).

Inflammation also plays a critical role—elevated NEUT indicates systemic inflammation, which promotes osteoclast differentiation via pro-inflammatory cytokines such as IL-6 and TNF-α (34). Reduced HGB levels, indicative of anemia, may contribute to hypoxia and nutritional deficiencies, impairing bone repair capacity (35). UA plays a complex role in bone metabolism, particularly in patients with T2DM.While uric acid may exert antioxidant effects at physiological levels, elevated levels in patients with T2DM often indicate an adverse metabolic state that contributes to bone loss and increased fracture risk (36). From a metabolic perspective, while UA has antioxidant properties, its pro-inflammatory effects may counteract its protective role (37). Elevated ALT levels may impair vitamin D activation, thereby affecting calcium absorption (38). Both TG and LHR contribute to lipotoxicity, inhibiting osteoblast differentiation and promoting adipogenesis, ultimately accelerating bone loss (39).These mechanisms—driven by insulin resistance, oxidative stress, and AGEs accumulation—interact to form a complex “metabolism-inflammation-bone loss” network, leading to reduced bone density and increased fracture risk. SHAP-based interpretability analysis further validated the biological plausibility of our model’s predictions.

By integrating demographic and routine biochemical data, our logistic regression model achieved an AUC of 0.812 in the testing set—superior to previous models—and elucidated the intricate interplay among metabolic disturbances, chronic inflammation, and bone loss (40). Compared to the Osteoporosis Self-assessment Tool for Asians (OSTA), a widely used screening tool in Asian populations, our model demonstrated superior predictive performance. OSTA relies solely on age and body weight, and previous studies have reported an AUC of 0.736, with a sensitivity of 73.1% and specificity of 69.8% for identifying osteoporosis (T-score ≤ –2.5). However, OSTA does not incorporate metabolic or biochemical indicators, which are particularly relevant in patients with T2DM who often present with complex metabolic profiles. In contrast, our logistic regression model achieved an AUC of 0.812 and a sensitivity of 80.9%, outperforming OSTA by leveraging routinely available laboratory parameters (41).This model offers a practical approach for the early identification of osteoporosis in T2DM patients, enabling clinicians to initiate personalized management strategies. Although dual-energy X-ray absorptiometry (DXA) remains the gold standard for diagnosing osteoporosis, its availability is limited, especially in primary care settings (42). Our model can serve as an intelligent prescreening tool before DXA testing, facilitating early diagnosis and intervention.

To enhance clinical usability, we developed an online calculator based on the final logistic regression model. The tool requires input of only ten routine clinical variables,which are commonly available in patients with T2DM. This design allows for rapid, cost-effective screening without the need for imaging or additional lab tests. The model has been applied to the screening of osteoporosis in orthopedic inpatients in our hospital who may benefit from further DXA evaluation.

Limitations

This study has several limitations. First, it was a single-center retrospective study, which may introduce selection bias and limit the generalizability of the findings. Differences in clinical practices, population characteristics, and laboratory standards across institutions may affect model performance. Therefore, external validation using multi-center and prospective cohorts is essential to assess the robustness and applicability of the model in broader clinical settings. Additionally, due to the retrospective design, some potentially relevant variables—such as height and weight—were not available and could not be included in the analysis, which may have influenced model accuracy.In the future, we intend to conduct a multicenter prospective study and seamlessly integrate the model into the hospital information system to achieve real-time risk stratification of osteoporosis in patients with type 2 diabetes. This will support early identification and timely referral for DXA assessment.

Conclusion

In summary, we developed and validated an interpretable machine learning model based on routinely collected clinical and laboratory data to predict osteoporosis risk in patients with type 2 diabetes mellitus. The logistic regression model demonstrated favorable predictive performance, outperforming traditional screening tools, and provides a practical approach for early identification of patients at high risk for osteoporosis. By integrating commonly available biomarkers, the model facilitates cost-effective and accessible screening, especially in settings where DXA is unavailable or limited. Furthermore, the development of an online calculator enhances the model’s clinical utility by enabling easy risk assessment in real time.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Ethics statement

Ethics Committee of Panyu Hospital of Chinese Medicine. The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because the retrospective nature of the study.

Author contributions

QW: Funding acquisition, Resources, Writing – original draft, Writing – review & editing. JZ: Software, Writing – original draft. XC: Writing – review & editing. HL: Data curation, Writing – review & editing. WG: Writing – review & editing. QH: Validation, Writing – review & editing. ZL: Writing – review & editing. SC: Writing – review & editing. DC: Data curation, Funding acquisition, Resources, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by Key medical disciplines in Panyu District (2022-2024),Panyu District Science and Technology Programme Major Healthcare Projects (2022-Z04-112),Panyu District Science and Technology Program (2024-Z04-048) and Panyu District Science and Technology Program (2024-Z04-006).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Glossary

ML: Machine Learning

T2DM: Type 2 Diabetes Mellitus

OP: Osteoporosis

DXA: Dual-energy X-ray Absorptiometry

BMD: Bone mineral density

DCA: Decision Curve Analysis

AUC: Area under the receiver operating characteristic curve

VIF: Variance inflation factor

SHAP: SHapley Additive exPlanations

HGB: Hemoglobin

NEUT: Neutrophil count

RBC: Red blood cell count

PLT: Platelet count

LYMPH: Lymphocyte count

MONO: Monocyte count

TC: Total cholesterol

TG: Triglycerides

HDL-C: High-density lipoprotein cholesterol

LDL-C: Low-density lipoprotein cholesterol

ALP: Alkaline phosphatase

ALT: Alanine aminotransferase

ALB: Albumin

UA: Uric acid

FBG: Fasting blood glucose

Ca: Serum calcium

Cr: Creatinine

Pi: Serum phosphate

MHR: Monocyte-to-HDL ratio

NHR: Neutrophil-to-HDL ratio

PHR: Platelet-to-HDL ratio

LHR: Lymphocyte-to-HDL ratio

TyG: Triglyceride-Glucose index

CHG: Cholesterol-Glucose index

NHHR: Non-HDL-to-Neutrophil ratio

RF: Random Forest

LR: Logistic regression

SVM: Support vector machine

GBM: Gradient boosting machine

NN: Neural network

XGBoost: Extreme gradient boosting

KNN: K-nearest neighbors

References

1. 2. Classification and diagnosis of diabetes: standards of medical care in diabetes-2019. Diabetes Care. (2019) 42:S13–28. doi: 10.2337/dc19-S002

PubMed Abstract | Crossref Full Text | Google Scholar

2. Chen Y, Zhao W, Hu A, Lin S, Chen P, Yang B, et al. Type 2 diabetic mellitus related osteoporosis: focusing on ferroptosis. J Transl Med. (2024) 22:409. doi: 10.1186/s12967-024-05191-x

PubMed Abstract | Crossref Full Text | Google Scholar

3. Si Y, Wang C, Guo Y, Yin H, and Ma Y. Prevalence of osteoporosis in patients with type 2 diabetes mellitus in the chinese mainland: a protocol of systematic review and meta-analysis. Med (Baltimore). (2020) 99:e19762. doi: 10.1097/MD.0000000000019762

PubMed Abstract | Crossref Full Text | Google Scholar

4. Hu X, Varkanitsa M, Kropp E, Betke M, Ishwar P, and Kiran S. Aphasia severity prediction using a multi-modal machine learning approach. Neuroimage. (2025), 121300. doi: 10.1016/j.neuroimage.2025.121300

PubMed Abstract | Crossref Full Text | Google Scholar

5. Zhang Y, Ma M, Huang X, Liu J, Tian C, Duan Z, et al. Machine learning is changing osteoporosis detection: an integrative review. Osteoporos Int. (2025). doi: 10.1007/s00198-025-07541-x

PubMed Abstract | Crossref Full Text | Google Scholar

6. Sadat-Ali M, Alzahrani BA, Alqahtani TS, Alotaibi MA, Alhalafi AM, Alsousi AA, et al. Accuracy of artificial intelligence in prediction of osteoporotic fractures in comparison with dual-energy x-ray absorptiometry and the fracture risk assessment tool: a systematic review. World J Orthop. (2025) 16:103572. doi: 10.5312/wjo.v16.i4.103572

PubMed Abstract | Crossref Full Text | Google Scholar

7. Lin N, Li D, Lin S, Ke Y, Lu J, Wu Y, et al. The monocyte-to-high-density lipoprotein cholesterol ratio as a novel predictor of the prevalence of senile osteoporosis. Clin Interv Aging. (2024) 19:1773–88. doi: 10.2147/CIA.S478461

PubMed Abstract | Crossref Full Text | Google Scholar

8. Yousefiasl M, Soltanattar A, Ezzatollahi TA, Azami P, Alaei M, Alamdari AA, et al. Association of triglyceride-glucose index with bone mineral density and fracture: a systematic review. Diabetol Metab Syndr. (2025) 17:77. doi: 10.1186/s13098-025-01642-7

PubMed Abstract | Crossref Full Text | Google Scholar

9. Wang L, Xu D, Chen M, and Huang X. Association of neutrophil to high-density lipoprotein cholesterol ratio with fragility fracture in osteoporotic patients: a case-control study. BMC Musculoskelet Disord. (2025) 26:193. doi: 10.1186/s12891-025-08440-2

PubMed Abstract | Crossref Full Text | Google Scholar

10. Ye H, Chen Z, Li K, Zhang Y, Li H, and Tian N. Non-linear association of the platelet/high-density lipoprotein cholesterol ratio with bone mineral density a cross-sectional study. Lipids Health Dis. (2024) 23:300. doi: 10.1186/s12944-024-02291-x

PubMed Abstract | Crossref Full Text | Google Scholar

11. Fu Q, Zhang C, Yang Y, Teng R, Liu F, Liu P, et al. Correlation study of multiple inflammatory indices and vertebral compression fracture: a cross-sectional study. J Clin Transl Endocrinol. (2024) 37:100369. doi: 10.1016/j.jcte.2024.100369

PubMed Abstract | Crossref Full Text | Google Scholar

12. Wang J, Li S, Pu H, and He J. The association between the non-high-density lipoprotein cholesterol to high-density lipoprotein cholesterol ratio and the risk of osteoporosis among u.s. Adults: analysis of nhanes data. Lipids Health Dis. (2024) 23:161. doi: 10.1186/s12944-024-02152-7

PubMed Abstract | Crossref Full Text | Google Scholar

13. Mo D, Zhang P, Zhang M, Dai H, and Guan J. Cholesterol, high-density lipoprotein, and glucose index versus triglyceride-glucose index in predicting cardiovascular disease risk: a cohort study. Cardiovasc Diabetol. (2025) 24:116. doi: 10.1186/s12933-025-02675-y

PubMed Abstract | Crossref Full Text | Google Scholar

14. Sanchez-Pinto LN, Venable LR, Fahrenbach J, and Churpek MM. Comparison of variable selection methods for clinical predictive modeling. Int J Med Inform. (2018) 116:10–7. doi: 10.1016/j.ijmedinf.2018.05.006

PubMed Abstract | Crossref Full Text | Google Scholar

15. Guan C, Gong A, Zhao Y, Yin C, Geng L, Liu L, et al. Interpretable machine learning model for new-onset atrial fibrillation prediction in critically ill patients: a multi-center study. Crit Care. (2024) 28:349. doi: 10.1186/s13054-024-05138-0

PubMed Abstract | Crossref Full Text | Google Scholar

16. Huang D, Gong L, Wei C, Wang X, and Liang Z. An explainable machine learning-based model to predict intensive care unit admission among patients with community-acquired pneumonia and connective tissue disease. Respir Res. (2024) 25:246. doi: 10.1186/s12931-024-02874-3

PubMed Abstract | Crossref Full Text | Google Scholar

17. Bhandari M, Shahi TB, Siku B, and Neupane A. Explanatory classification of cxr images into covid-19, pneumonia and tuberculosis using deep learning and xai. Comput Biol Med. (2022) 150:106156. doi: 10.1016/j.compbiomed.2022.106156

PubMed Abstract | Crossref Full Text | Google Scholar

18. Schwarz P. Idf global clinical practice recommendations for managing type 2 diabetes - 2025. Diabetes Res Clin Pract. (2025), 112158. doi: 10.1016/j.diabres.2025.112158

PubMed Abstract | Crossref Full Text | Google Scholar

19. Wang L, Gao P, Zhang M, Huang Z, Zhang D, Deng Q, et al. Prevalence and ethnic pattern of diabetes and prediabetes in China in 2013. Jama. (2017) 317:2515–23. doi: 10.1001/jama.2017.7596

PubMed Abstract | Crossref Full Text | Google Scholar

20. Ogurtsova K, Da RFJ, Huang Y, Linnenkamp U, Guariguata L, Cho NH, et al. Idf diabetes atlas: global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res Clin Pract. (2017) 128:40–50. doi: 10.1016/j.diabres.2017.03.024

PubMed Abstract | Crossref Full Text | Google Scholar

21. Yan F, Chen X, Quan X, Wang L, Wei X, and Zhu J. Association between the stress hyperglycemia ratio and 28-day all-cause mortality in critically ill patients with sepsis: a retrospective cohort study and predictive model establishment based on machine learning. Cardiovasc Diabetol. (2024) 23:163. doi: 10.1186/s12933-024-02265-4

PubMed Abstract | Crossref Full Text | Google Scholar

22. Gerdisch MW. Alas, the return of the lasso. Ann Thorac Surg. (2022) 114:e383. doi: 10.1016/j.athoracsur.2022.02.009

PubMed Abstract | Crossref Full Text | Google Scholar

23. Xu K and Kang H. Statistical consideration in nephrology research. Kidney Res Clin Pract. (2025). doi: 10.23876/j.krcp.25.046

PubMed Abstract | Crossref Full Text | Google Scholar

24. Xu N, Wang Y, Chen Y, Guo YN, Li RX, Zhou YM, et al. The associations between uric acid with bmds and risk of the 10-year probability of fractures in chinese patients with t2dm stratified by age and gender. Endocrine. (2023) 80:520–28. doi: 10.1007/s12020-023-03353-3

PubMed Abstract | Crossref Full Text | Google Scholar

25. Jiang L, Song X, Yan L, Liu Y, Qiao X, and Zhang W. Molecular insights into the interplay between type 2 diabetes mellitus and osteoporosis: implications for endocrine health. Front Endocrinol (Lausanne). (2024) 15:1483512. doi: 10.3389/fendo.2024.1483512

PubMed Abstract | Crossref Full Text | Google Scholar

26. Huang R, Chen Y, Tu M, and Wang W. Monocyte to high-density lipoprotein and apolipoprotein a1 ratios are associated with bone homeostasis imbalance caused by chronic inflammation in postmenopausal women with type 2 diabetes mellitus. Front Pharmacol. (2022) 13:1062999. doi: 10.3389/fphar.2022.1062999

PubMed Abstract | Crossref Full Text | Google Scholar

27. Zhang W, Li Y, Li S, Zhou J, Wang K, Li Z, et al. Associations of metabolic dysfunction-associated fatty liver disease and hepatic fibrosis with bone mineral density and risk of osteopenia/osteoporosis in t2dm patients. Front Endocrinol (Lausanne). (2023) 14:1278505. doi: 10.3389/fendo.2023.1278505

PubMed Abstract | Crossref Full Text | Google Scholar

28. Bouillon R, Bex M, Van Herck E, Laureys J, Dooms L, Lesaffre E, et al. Influence of age, sex, and insulin on osteoblast function: osteoblast dysfunction in diabetes mellitus. J Clin Endocrinol Metab. (1995) 80:1194–202. doi: 10.1210/jcem.80.4.7714089

PubMed Abstract | Crossref Full Text | Google Scholar

29. Fischer V and Haffner-Luntzer M. Interaction between bone and immune cells: implications for postmenopausal osteoporosis. Semin Cell Dev Biol. (2022) 123:14–21. doi: 10.1016/j.semcdb.2021.05.014

PubMed Abstract | Crossref Full Text | Google Scholar

30. Marcucci G, Domazetovic V, Nediani C, Ruzzolini J, Favre C, and Brandi ML. Oxidative stress and natural antioxidants in osteoporosis: novel preventive and therapeutic approaches. Antioxidants (Basel). (2023) 12:373. doi: 10.3390/antiox12020373

PubMed Abstract | Crossref Full Text | Google Scholar

31. Park SY, Choi KH, Jun JE, and Chung HY. Effects of advanced glycation end products on differentiation and function of osteoblasts and osteoclasts. J Korean Med Sci. (2021) 36:e239. doi: 10.3346/jkms.2021.36.e239

PubMed Abstract | Crossref Full Text | Google Scholar

32. Rodriguez-Martinez MA and Garcia-Cohen EC. Role of ca(2+) and vitamin d in the prevention and treatment of osteoporosis. Pharmacol Ther. (2002) 93:37–49. doi: 10.1016/s0163-7258(02)00164-x

PubMed Abstract | Crossref Full Text | Google Scholar

33. Zhan J, Wei Q, Guo W, Liu Z, Chen S, Huang Q, et al. Evaluating the triglyceride glucose index as a predictive biomarker for osteoporosis in patients with type 2 diabetes. Front Endocrinol (Lausanne). (2025) 16:1534232. doi: 10.3389/fendo.2025.1534232

PubMed Abstract | Crossref Full Text | Google Scholar

34. Wang T and He C. Tnf-alpha and il-6: the link between immune and bone system. Curr Drug Targets. (2020) 21:213–27. doi: 10.2174/1389450120666190821161259

PubMed Abstract | Crossref Full Text | Google Scholar

35. Xiu S, Mu Z, Sun L, Zhao L, and Fu J. Hemoglobin level and osteoporosis in chinese elders with type 2 diabetes mellitus. Nutr Diabetes. (2022) 12:19. doi: 10.1038/s41387-022-00198-z

PubMed Abstract | Crossref Full Text | Google Scholar

36. Li X, Peng Y, Chen K, Zhou Y, and Luo W. Association between serum uric acid levels and bone mineral density in chinese and american: a cross-sectional study. Sci Rep. (2025) 15:8304. doi: 10.1038/s41598-025-92348-3

PubMed Abstract | Crossref Full Text | Google Scholar

37. Xu R, Lian D, Xie Y, Mu L, Wu Y, Chen Z, et al. Relationship between serum uric acid levels and osteoporosis. Endocr Connect. (2023) 12:e230040. doi: 10.1530/EC-23-0040

PubMed Abstract | Crossref Full Text | Google Scholar

38. Liangpunsakul S and Chalasani N. Serum vitamin d concentrations and unexplained elevation in alt among us adults. Dig Dis Sci. (2011) 56:2124–29. doi: 10.1007/s10620-011-1707-x

PubMed Abstract | Crossref Full Text | Google Scholar

39. Tagliaferri C, Salles J, Landrier JF, Giraudet C, Patrac V, Lebecque P, et al. Increased body fat mass and tissue lipotoxicity associated with ovariectomy or high-fat diet differentially affects bone and skeletal muscle metabolism in rats. Eur J Nutr. (2015) 54:1139–49. doi: 10.1007/s00394-014-0790-0

PubMed Abstract | Crossref Full Text | Google Scholar

40. Tan J, Zhang Z, He Y, Xu X, Yang Y, Xu Q, et al. Development and validation of a risk prediction model for osteoporosis in elderly patients with type 2 diabetes mellitus: a retrospective and multicenter study. BMC Geriatr. (2023) 23:698. doi: 10.1186/s12877-023-04306-1

PubMed Abstract | Crossref Full Text | Google Scholar

41. Sun Y, Du X, Huang K, and Yu S. Evaluation of osteoporosis self-assessment tool for asians in screening of osteoporosis for postmenopausal women in community. Chin J Gen Practitioners. (2014). doi: 10.3760/cma.j.issn.1671-7368.2014.08.013

Crossref Full Text | Google Scholar

42. Kanis JA. Diagnosis of osteoporosis and assessment of fracture risk. Lancet. (2002) 359:1929–36. doi: 10.1016/S0140-6736(02)08761-5

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: osteoporosis, type 2 diabetes mellitus, explainable machine learning, predictive model, risk assessment

Citation: Wei Q, Liu Z, Chen X, Li H, Guo W, Huang Q, Zhan J, Chen S and Cai D (2025) Development and validation of an explainable machine learning model for predicting osteoporosis in patients with type 2 diabetes mellitus. Front. Endocrinol. 16:1611499. doi: 10.3389/fendo.2025.1611499

Received: 14 April 2025; Accepted: 15 July 2025;
Published: 07 August 2025.

Edited by:

Qiong Lu, Central South University, China

Reviewed by:

Jiaxue Cui, Dalian Medical University, China
Ashwini Amin, Manipal Institute of Technology, India

Copyright © 2025 Wei, Liu, Chen, Li, Guo, Huang, Zhan, Chen and Cai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Dongling Cai, Y2RsX3NwaW5lQDEyNi5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.