Construction of a 3-year risk prediction model for developing diabetes in patients with pre-diabetes

Introduction To analyze the influencing factors for progression from newly diagnosed prediabetes (PreDM) to diabetes within 3 years and establish a prediction model to assess the 3-year risk of developing diabetes in patients with PreDM. Methods Subjects who were diagnosed with new-onset PreDM at the Physical Examination Center of the First Affiliated Hospital of Soochow University from October 1, 2015 to May 31, 2023 and completed the 3-year follow-up were selected as the study population. Data on gender, age, body mass index (BMI), waist circumference, etc. were collected. After 3 years of follow-up, subjects were divided into a diabetes group and a non-diabetes group. Baseline data between the two groups were compared. A prediction model based on logistic regression was established with nomogram drawn. The calibration was also depicted. Results Comparison between diabetes group and non-diabetes group: Differences in 24 indicators including gender, age, history of hypertension, fatty liver, BMI, waist circumference, systolic blood pressure, diastolic blood pressure, fasting blood glucose, HbA1c, etc. were statistically significant between the two groups (P<0.05). Differences in smoking, creatinine and platelet count were not statistically significant between the two groups (P>0.05). Logistic regression analysis showed that ageing, elevated BMI, male gender, high fasting blood glucose, increased LDL-C, fatty liver, liver dysfunction were risk factors for progression from PreDM to diabetes within 3 years (P<0.05), while HDL-C was a protective factor (P<0.05). The derived formula was: In(p/1-p)=0.181×age (40-54 years old)/0.973×age (55-74 years old)/1.868×age (≥75 years old)-0.192×gender (male)+0.151×blood glucose-0.538×BMI (24-28)-0.538×BMI (≥28)-0.109×HDL-C+0.021×LDL-C+0.365×fatty liver (yes)+0.444×liver dysfunction (yes)-10.038. The AUC of the model for predicting progression from PreDM to diabetes within 3 years was 0.787, indicating good predictive ability of the model. Conclusions The risk prediction model for developing diabetes within 3 years in patients with PreDM constructed based on 8 influencing factors including age, BMI, gender, fasting blood glucose, LDL-C, HDL-C, fatty liver and liver dysfunction showed good discrimination and calibration.


Introduction
Diabetes is one of the most common chronic basic diseases at present, and about 90% of them are type 2 diabetes (T2DM) (1).It can cause a variety of metabolic disorders, such as hyperglycemia, hyperlipidemia, insulin resistance (IR), etc., and also lead to Fserious complications, such as diabetes nephropathy (DN), diabetes retinopathy, diabetes peripheral neuropathy, cardiovascular disease (CVD), etc. (2), which is very harmful.The data released by the international diabetes alliance in 2021 shows that about 537 million people aged 20-79 in the world suffer from T2DM (accounting for 10.5% of the total population), and the number is expected to rise to 783 million by 2024, which will bring a heavy burden to the global social and economic development (3).The incidence rate of T2DM in China is about 11.2%, ranking first in the world in terms of the number of patients.Therefore, the prevention and treatment of T2DM is extremely urgent.
Prediabetes mellitus (PreDM) is an intermediate stage of dysglycemia preceding diabetes mellitus, and it is associated with increased risks of cardiovascular disease, microvascular complications, tumors, dementia, depression, etc. (4).About 5-10% of PreDM cases progress to diabetes annually, and without intervention, over 70% of PreDM would finally advance to diabetes (5,6).In the study in Da Qing, China, the cumulative incidence rate of diabetes reached 95.9% in PreDM patients after a 30-year followup; while for PreDM patients with impaired glucose tolerance receiving lifestyle intervention for 6 years, their 30-year cumulative risk of developing diabetes decreased by 39% (6).Therefore, early screening for PreDM and intervention on highrisk populations can delay the progression of diabetes and prevent the incidence of diabetes.
With the development of economy, in nowadays, researchers have paid more attention to the physical examinations (7,8).Finding valuable information related to diabetes from physical examination data and finding out the changing pattern of diabetes at all stages is of great importance to the prevention and treatment of diabetes (9).For example, during the physical examination procedure, gender, age, body mass index (BMI), waist circumference, past medical history, smoking history, systolic blood pressure (SBP), diastolic blood pressure (DBP), pulse rate, white blood cells (WBC), red blood cells (RBC), hemoglobin (Hb), platelet count (PLT), total cholesterol (TC), triglycerides (TG), high density lipoprotein cholesterol (HDL-C), low density lipoprotein cholesterol (LDL-C), blood glucose, glycated hemoglobin (HbA1c), creatinine (Cr), estimated glomerular filtration rate (eGFR), uric acid, alanine aminotransferase (ALT), aspartate aminotransferase (AST), gglutamyltransferase (g-GT), level of urine protein and whether the patient has fatty liver were always collected for the diagnosis of diabetes.Moreover, risk factors associated with diabetes have been discussed in many previous studies.For example, in the past, the onset age of diabetes was mainly middle-aged and elderly people over 50 years old, but now it is mainly concentrated in young adults aged 20 to 40 years old, which is closely related to the current unhealthy lifestyle of young people; moreover, type 2 diabetes is highly correlated with obesity, and an obese family (BMI ≥ 30) has a rapidly increasing risk of diabetes since the age of 35; further, genetic susceptibility is a characteristic of diabetes, and is also affected by other factors, such as living habits and family environment (10).The incidence rate of people without diabetes family history is lower than that of people with diabetes family history, so we should focus on this part of people with diabetes family history in the early stage who have diabetes family history (11); One of the important factors of diabetes is the abnormal increase of triglycerides.The basic reasons are: The triglycerides and free fatty acids increase at the same time, and the islets of langerhans b, the main component of cellular regulation is free fatty acids, and their elevation causes abnormal insulin secretion, leading to abnormal elevation of blood sugar Elevated triglycerides induce an exacerbation of insulin resistance and exacerbate dyslipidemia, leading to a vicious cycle.At present, diabetes is a metabolic disease with a very high incidence rate and a very low treatment compliance rate.Poor control can lead to the occurrence of various acute and chronic complications Therefore, it important for an early detection of diabetes for improving the quality of life of the patients.
In recent years, researchers have developed models for the prediction of diabetes.However, studies on developing models for the prediction of diabetes are still required for the early diagnosis and management of diabetes.In current work, we intend to construct a prediction model for evaluating the progression from PreDM to diabetes mellitus within 3 years.By screening PreDM patients through health examinations and following them up for 3 years, high-risk factors for diabetes occurrence will be analyzed to identify high-risk patients and formulate targeted strategies, so as to reduce the risk of diabetes.

Study subjects
Data from the Physical Examination Center of the First Affiliated Hospital of Soochow University from October 1, 2015 to May 31, 2023 were selected.Newly diagnosed PreDM cases were screened and followed up for 3 years.A total of 4,602 PreDM patients were finally enrolled.Inclusion criteria: 1) Age ≥20 years old; 2) Fasting blood glucose and HbA1c data must be available in the physical examination; 3) 6.1≤fasting blood glucose<7.0mmol/L and/or 5.7%≤HbA1c<6.5%;4) No previous history of diabetes or hyperglycemia; 5) Not lost to follow-up, and must have physical examination data in the 4th year, or developed diabetes within 3 years.Exclusion criteria: 1) Use of hypoglycemic agents or steroid hormones within 1 year; 2) Severe liver and kidney dysfunction; 3) Severe anemia, hemoglobin disease, pregnancy, AIDS, malignant tumors; 4) Recurrent acute pancreatitis, history of acute pancreatitis in the last 3 months, history of surgery in the last 3 months.A completely random sampling method was adopted, with 70% of the included population as the training set and 30% as the test set (Figure 1).

Data collection
The clinical data included: gender, age, body mass index (BMI), waist circumference, past history (history of hypertension, diabetes, high blood glucose), smoking history, systolic blood pressure (SBP), diastolic blood pressure (DBP), pulse rate, white blood cells (WBC), red blood cells (RBC), hemoglobin (Hb), platelet count (PLT), total cholesterol (TC), triglycerides (TG), high density lipoprotein cholesterol (HDL-C), low density lipoprotein cholesterol (LDL-C), blood glucose, glycated hemoglobin (HbA1c), creatinine (Cr), estimated glomerular filtration rate (eGFR), uric acid, alanine aminotransferase (ALT), aspartate aminotransferase (AST), gglutamyltransferase (g-GT), urine protein (0 for negative, 1 for positive), fatty liver (0 for no, 1 for yes).The collected data were from the first-year physical examination results of the new-onset PreDM population.For the missing value of variables, imputation was done using regression method, that each variable was estimated 100 times and the mean value was using in the analysis.

Outcomes
The endpoint of this study was the onset of diabetes in the subjects.Diabetes was defined as FPG≥7.0 mmol/L or 2hPG≥11.1 mmol/L.According to WHO's recommendation in 2011, HbA1c≥6.5% was supplemented as a diagnostic criterion for diabetes.

Statistical methods
Statistical analysis was performed using SAS 9.4 software and R software version 4.2.3.Normally distributed measurement data were expressed as mean ± standard deviation, non-normally distributed data were expressed as median (lower quartile, upper quartile), and qualitative data were expressed as frequency (percentage).Differences between groups for numerical variables were compared using the Kruskal-Wallis nonparametric test and ttest.Differences between groups for categorical variables were compared using Chi-square test.A two-sided test of P<0.05 indicated statistically significant difference.

FIGURE 1
Research object selection process.

Model development and validation
Logistic regression was used for predictor variable selection on the training dataset, and variables that could serve as predictive factors were selected using the stepwise method (p value threshold of 0.2 for adding variables and 0.1 for removing variables).Considering the practical application value of the predictive model, we combined the results of the model selection and the clinical significance to determine the variables that ultimately served as predictors.Based on the predictors, we established a multivariate logistic regression model and reported the model parameters.Equations was utilized to construct the prediction model.The area under the curve (AUC) and 95% confidence interval was reported to evaluate discrimination.The slope (intercept) of the decile calibration curve was used to report calibration, calculated by regressing the observed outcome on the predicted probabilities.A slope closer to 1 and intercept closer to 0 represented better calibration power.
The parameters of the predictive model described above were applied to the validation dataset to validate the model.Similarly, the discrimination (AUC and 95%CI) and calibration (slope and intercept) ability of the model in the validation group are reported.Additionally, an internal validation based on the training group data after 1,000 resampling was performed, and the adjusted AUC was reported.To assess the heterogeneity within different subpopulations, we performed sensitive analysis among selected subgroups, including age (<50 and >=50 years), sex, BMI (<28 and >=28), whether having hypertension history, smoking, or whether having fatty liver.

Model application
A nomogram was depicted to present the results visually, and the calibration of the nomogram was calculated with a calibration curve plotted.Model clinical decision curve and clinical impact curve were also plotted to provided further information.

Comparison between diabetic and nondiabetic PreDM groups
From 2017 to 2023, a total of 4,602 PreDM samples were enrolled in the study, among which 760 participants (16.51%) developed diabetes within 3 years, and 3,842 participants (83.49%) did not.Results for comparison between the diabetic and non-diabetic PreDM groups showed that 24 indicators, including gender, age, history of hypertension, fatty liver, BMI, etc. were statistically significant between the two groups (all P<0.05).On the other hand, differences in smoking, creatinine and platelet count were not statistically significant (Table 1).
Based on the logistic regression results, a nomogram predicting 3-year risk of progression from prediabetes to diabetes was constructed using R software for visualization (Figure 2A).In the nomogram, each specific situation of the risk factors corresponds to a certain score.The total score is calculated by adding up the scores of the 6 indicators.Then a vertical line has been drawn downward at the location of the total score, and the corresponding value of the intersection point between the vertical line and the "probability of diabetes occurrence" coordinate is the 3-year risk of progression from PreDM to diabetes.The calibration curve of the nomogram was also been plotted (Figure 2B).The AUC of the training model ROC curve = 0.787 (95%CI: [0.765, 0.808]), AUC of internal validation on testing set = 0.800 (95%CI: [0.770, 0.829]) (Figure 3A), indicating good predictive ability of the model.The calibration slope = 1.008 and intercept = -0.001suggested good calibration (Figure 3B).The model accuracy plots were verified by clinical decision curve (Figure 3C) and clinical impact curve (Figure 3D).Results from sensitive analysis showed quite stable AUC and calibration slope among different subgroups (Table 3).

Discussion
Diabetes has become one of the leading causes of human death in recent decades (12).The incidence of diabetes increased every year due to eating habits, sedentary lifestyle, and prevalence of unhealthful foods (13,14).Diabetes prediction model can contribute to the decision-making process in clinical management (15,16).Knowing the potential risk factors and identifying individuals at high risk in the early stages may facilitate the process for the prevention of diabetes.A host of prediction models for diabetes have been developed, out of which the logistic regression (17) and a machine learning algorithm-based classification tree (18) are among the most popular methods.For example, Habibi et al. suggest that a simple machine learning algorithm, a classification tree, could be used to screen diabetes without using a laboratory (19).However, the validity of these models for different locations, populations with different diets, lifestyle, races, and genetic makeup is still unknown.Additionally, the performance of the models varied in different circumstances.
Moreover, the International Diabetes Federation estimates that the number of adults with impaired glucose tolerance will reach 730 million by 2045, accounting for 11.2% of the world's adult population (20).and cardiac metabolic outcomes, while routine intervention for low-risk populations can avoid unnecessary overtreatment (22).Therefore, individualized and risk-based management of PreDM patients with high coverage is beneficial for slowing of the progression of diabetes and the reduction of diabetes morbidity.In this study, new-onset PreDM cases were screened out in the physical examination population.We successfully established a prediction model and a nomogram predicting 3-year risk of progression from prediabetes to diabetes was depicted to assess diabetes risk.The results suggested that intervention and management of high-risk populations is of positive clinical significance in delaying diabetes progression and reducing diabetes incidence.Models for the prediction of diabetes have been developed in previous studies.For example, Zou et al. used principal component analysis (PCA) and minimum redundant maximum (mRMR) correlation to screen risk factors, and utilized decision tree (DT), random forest (RF) and neural network (NN) to predict diabetes (23).By using mutual information (MI) and Gini impurity (GI) to screen diabetes-related risk factors in physical examination data, Yang et al. established a cascade diabetes risk prediction system (24).Moreover, the invasive risk assessment model HCL predicted diabetes by using invasive characteristics and referring to Harvard Cancer Risk Index (25).Further, Li et al. established a prediction model for type 2 diabetes for Han Chinese population.The result suggested that genetic risk score is a crucial element to predicting the risk of type 2 diabetes.In conclusion, different prediction models for diabetes have been established and the early diagnosis of diabetes could be achieved; Hu et al. established a nomogram model for the prediction of 5-year risk of prediabetes in Chinese adults, and they suggested that this model could be applied for prediabetes prediction and assessing the risk of prediabetes (26); Cai et al. established a model for the incidence of type 2 diabetes in nonobese patients in 5 years, the authors claimed that the mode is helpful for reducing the risk of T2D in non-obese adults (27); finally, Cai et al. developed a model for predicting the 5-year risk of T2D in hypertension patients, and they found the model could reduce the incidence of T2D in patients with hypertension (28).However, in current clinical field, there are limited studies focused on the short-term progress of type 2 diabetes within three years, and in this study, we first completed a 3-year follow-up, which is a supplement to the T2D related field.
Our results showed that age, gender, increased BMI, high blood glucose level, elevated level of low-density cholesterol, fatty liver, and abnormal liver function were the risk factors for patients with PreDM to progress to diabetes within three years.Many previous studies showed the incident rate of T2DM increase with age.It has been reported that about 4% diabetic patients aged less than 44 years, while the percentage of reached 17.0% for people between 45-64 years, and 25.2% for people ≥65 years (29).Moreover, Peng et al. reported that 16.9% patients had T2DM and a follow-up survey on the same group suggested that the number increased to 23.7% (30).This may due to the decreased sensitivity to insulin with the increase of age.Moreover, the distribution of T2DM was 221.0 million for males and 203.9 million for females, suggesting that T2DM is related to gender (31).Furthermore, 50% T2DM patients are obese (BMI > 30 kg/m2), while 90% diabetic patients are overweight (BMI > 25 kg/m2), suggesting that BMI is a risk factor for proceeding to diabetes.Moreover, high blood glucose level is an early indicator of pre-diabetes to diabetes, when the patient is diagnosed with impaired glucose tolerance (IGT) or impaired fasting glucose (IFG) (32).Next, in PreDM patients, dyslipidemia has been caused by increased LDL-C level and decreased HDL-C levels (33,34).Aberrantly increased LDL-C as well as decreased HDL-C may lead to dysfunction of the islet b cells, and accelerated the procedure from PreDM to type 2 DM (35).Results of previous studies showed either LDL-C or HDL-C were correlated with the risk of abnormal glucose metabolism.Finally, epidemiological studies showed that there is a clear relationship between fatty liver and the incidence of type 2 diabetes, increased fatty liver index may increase the odds for the incident of PreDM prediabetes.On the other hand, the abnormality liver function is strongly associated with obesity and insulin resistance, as a result, abnormality in liver function has become an independent risk factor for incident T2DM (36)(37)(38)(39)(40).To sum up, the result of current study confirmed the risk factors for patients with PreDM to progress to diabetes, however, the results still need to be confirmed with in depth study.
This study still has some limitations: (1) Fewer females with preDM were enrolled, and there was an imbalance in the male-tofemale ratio; (2) HbA1c better reflects glucose tolerance, but considering the inadequate standardization of HbA1c testing among different hospitals, it was not used in the prediction model; (3) No external validation was performed.Multi-center studies will be carried out later to further improve the prediction model.At present, only the incidence of diabetes has been observed during the past three years, and in future studies, we will first expand the sample size, and second.there will be more index for observations, such as the complications of diabetes, and the followup of the patients will continue to be observed.
In summary, in PreDM patients, good control of body weight, blood lipids, reversal of fatty liver, and maintenance of liver function in the normal range, especially for males over 55 years old, are effective management measures to delay progression from PreDM to diabetes.Timely individualized interventions should be adopted for high-risk PreDM populations to reduce the risk of developing diabetes.

TABLE 1
Comparison of the baseline data of PreDM population with or without diabetes.

TABLE 2
Logistic regression analysis of risk factors affecting the progression of PreDM patients to diabetes in three years.
(A) Predictive Model Column Chart; (B) Column chart calibration curve.

TABLE 3
Model performance among subgroups.