Establishment of Clinical Prediction Model Based on the Study of Risk Factors of Stroke in Patients With Type 2 Diabetes Mellitus

Purpose: Stroke has sparked global concern as it seriously threatens people's life, bringing about dramatic health burdens on patients, especially for type 2 diabetes mellitus (T2DM) patients. Therefore, a risk scoring model is urgently valuable for T2DM patients to predict the risk of stroke incidence and for positive health intervention. Methods: We randomly divided 4,335 T2DM patients into two groups, training set (n = 3,252) and validation set (n = 1,083), at the ratio of 3:1. Characteristic variables were then selected based on the data of training set through least absolute shrinkage and selection operator regression. Three models were established to verify predictive ability. Foundation model was composed of basic information and physical indicators. Biochemical model consisted of biochemical indexes. Integrated model combined the above two models. Data of three models were then put into logistic regression analysis to form nomogram prediction models. Tools including C index, calibration plot, and curve analysis were implemented to test discrimination, calibration, and clinical use. To select the best predicting model, net reclassification improvement (NRI) and integrated discrimination improvement (IDI) were put into effect. Results: Eleven risk factors were determined, including age, duration of T2DM, estimated glomerular filtration rate, systolic blood pressure, diastolic blood pressure, low-density lipoprotein, high-density lipoprotein, triglyceride, body mass index, uric acid, and glycosylated hemoglobin A1c, all with significant P-values through logistic regression analysis. In the training set, areas under the curve of three models were 0.810, 0.819, and 0.884, whereas in the validation set, they were 0.836, 0.832, and 0.909. Through calibration plot, the S:P values in the training set were 0.836, 0.754, and 0.621 and were 0.918, 0.682, and 0.666 separately in the validation set. In terms of the decision curve analysis, the risk thresholds were, respectively, 8–73%, 8–98%, and 8%~ in the training set and 8–70%, 8–90%, and 8–95% in the validation set. With the aid of NRI and IDI, integrated model is proved to be the best model in training set and validation set. Besides, internal validation was conducted on all the subjects in this study, and the C index was 0.890 (0.873–0.907). Conclusion: This study established a model predicting risk of stroke for T2DM patients through a community-based survey.


INTRODUCTION
Type 2 diabetes mellitus (T2DM), accounting for ∼90% total diabetes cases, is one of the most threatening non-communicable chronic diseases. Data from the latest IDF Diabetes Atlas showed that the number of adults aged 20-79 years in the world suffering from diabetes was ∼463 million in 2019. Diabetes mellitus (DM) is a great growing public health burden in China as the prevalence estimated at 11.6%, whereas that of prediabetes was ∼50.1% (1).
Stroke, as one of the macrovascular complications related to DM, results in extracranial carotid artery disease and intracranial large and small vessel diseases and includes clinical characteristics ranging from asymptomatic carotid artery occlusion or cerebral small vessel disease to transient ischemic attack and hemorrhagic and ischemic stroke (2). Stroke has been acknowledged in the form of a major issue in public health contributing to morbidity and mortality worldwide. According to the Atlas of Heart Disease and Stroke released by the World Health Organization, stroke is the third cause of death (ranks after myocardial infarction and cancer) in the world, and every year ∼17 million people die of cardiovascular diseases (CVDs) particularly attributed to heart attacks and strokes.
As one of the related complications of DM, stroke is the condition different from DM but with many common aspects (3). Nearly all types of stroke are known to be influenced by DM, including large artery stroke, lacunar stroke, intracerebral hemorrhage, and embolic stroke (4). Considerable prospective studies have indicated that, in comparison with non-diabetic population, patients with diabetes are at a higher risk of stroke among the western population (5)(6)(7)(8). A Chinese hospital study based on 2,532 hospitalized patients with a first stroke showed that diabetes had a remarkable frequency of strokes than non-diabetics (9). In contrast to non-diabetics, the risk of stroke of people with DM is 2.5-3.6 times higher (4). Through a prospective observational study including 210 acute stroke patients, patients with DM were proved to shoulder the huger burden with poorer outcome brought by acute stroke compared with non-diabetic patients (3). According to statistics, 80% of DM patients eventually died of macrovascular complications (10). Accordingly, risk factors of stroke for T2DM patients urgently need to be determined.
At present, many studies on T2DM or stroke describe risk factors of stroke and T2DM patients, respectively. The study combined the two diseases and aimed to find out risk factors of stroke for T2DM patients.
This study aimed to build a simple, convenient, and efficient prediction model because of the main risk factors affecting stroke for T2DM patients. In this study, three nomogram plots were demonstrated, and the most predictive, accurate, and effective one was found through net reclassification improvement (NRI) and integrated discrimination improvement (IDI). At the same time, we also developed an online application for predicting T2DM patients with stroke based on the nomogram plot. The work can be used for clinically evaluating T2DM patients to assess the risk of stroke incidence for them.

Patients
We worked with Shanghai University of Traditional Chinese Medicine-affiliated community health center hospitals for this study. From September 2014 to September 2019, we conducted baseline and follow-up study on all the patients in the seven communities including Community of Huamu, Community of Jinyang, Community of Sanlin, Community of Siping, Community of Yinhang, Community of Daqiao, and Community of Jiangpu in Shanghai and finally included 4,335 subjects in this study. Subjects were determined based on their medical history information. Patients with T2DM with a history of stroke were valid to be involved in this study. Questionnaire survey, physical examination, and biochemical examination contained values of each influencing factor in this study, which were crucial for forming results. In order not to affect the model establishment and results, accordingly, for data screening, we checked the missing values at the beginning. Patients with any lack of needed information would be excluded. After obtaining all the data and comparing the various data values in the population, subjects would be eliminated with any abnormal value of influencing factors. With exclusion of invalid questionnaires and those without complete information from all the collected questionnaires, we eventually involved 4,335 subjects in the study. Before enrolling the subjects in this study, we received written informed consent from all of them.

Procedure
We performed survey, including questionnaire surveys, physical examination, and biochemical test, and investigated all T2DM patients in seven communities with support from affiliated community health centers and central hospitals. All the researchers and investigators involved in the survey were welltrained and qualified to ensure standardization and scientific rigor in the procedure. A structured questionnaire survey was composed of social demographic characteristics, lifestyle factors, DM status, disease history, and drug history (lipidlowering, blood pressure-lowering, aspirin, and insulin). Besides, to determine the subjects precisely, we checked the electronic medical records of all the participants for filtering. Patients with T2DM were determined as the initial population. The diagnosis of T2DM was in accordance with the criteria defined by the World Health Organization in 1999 (14). Patients with stroke were then determined through rigorous screening of medical records to ensure validity for this study and were finally included.
All the physical indicators were measured with standard electronic devices. Systolic blood pressure and DBP were measured in standard sitting with OMRON blood pressure monitors. According to the Guidelines for the Prevention and Treatment of T2DM in China, BMI was calculated with weight (in kilograms) divided by square height (in meters squared). Biochemical indexes included FBG, PBG, HbA 1c , TC, TG, HDL-C, LDL-C, BUN, UA, and ACR. Estimated glomerular filtration rate was computed according to serum creatinine, age, and gender according to Modification of Diet in Renal Disease Trial. To test blood indicators, all the participants need to keep fasting for at least 10 h and took the examination at 7 in the morning. Two hours after the meal, urine was collected from participants for glycosuria measurement. All the blood samples were required to be taken for the operation of in situ centrifugation within 30 min after collection and stored in refrigerators at −80 • C for further study. All the samples were at once sent to hematology department of Ruijin Hospital Affiliated to Shanghai Jiaotong University and community health centers and central hospitals affiliated to Shanghai University of Traditional Chinese Medicine for testing after the scientific operation. Urine-related biochemical indicators were analyzed by uritest-500b (URIT, China).

Statistical Analysis
Through the community survey, we collected 4,335 T2DM patients, including 2,504 female patients and 1,831 male patients. With the aid of R software (version 3.6.2; https://www.R-project. org), we randomly divided patients into two groups, training set (n = 3,252) and validation set (n = 1,083) for external validation at a theoretical ratio of 3:1 (15). In the first step, we used data of the training set and took the least absolute shrinkage and selection operator (LASSO) regression method to analyze the data. Least absolute shrinkage and selection operator are a method applied for data dimensional reduction. Besides, the LASSO regression model takes double-standard error by constructing a penalty function. Concerning the characteristics of this method, we screened suitable and effective risk factors for T2DM patients with stroke in the LASSO regression analysis and selected 11 non-zero characteristic factors. We then obtained three models: foundation model, biochemical model, and integrated model, respectively, including basic physical indicators, biochemical indicators, and both indicators, and separately put into the multivariate logistic regression analysis. Variables selected through logistic regression analysis were considered of odds ratio (OR) and P-value with 95% confidence interval (CI), and the statistical significance levels were all two-sided. Based on the logistic regression results, we selected risk factors with the P-value of and <0.05 and constructed a nomogram prediction model. In this study, all the variables were selected. For the validation of the three models, we, respectively, calculated C index, receiver operating characteristic (ROC) curve, and dynamic component analysis (DCA) measurements based on the data from training set and validation set (16).
We used NRI and IDI to choose the best predictive model. NRI and IDI are two mutually complementary validation method to compare the accuracy and predictive ability of two prediction models, evaluating the effectiveness of index change compared with the old one. The difference between NRI and IDI is that the NRI only considers the improvement setting a certain cutoff point while the IDI inspects the overall improvement of the model. When NRI >0.1, the prediction model is improved, and if IDI >0.1, it indicates that this is an improvement and that the new model is better than the old model. The difference between NRI and IDI is that the NRI only considers the improvement when setting a certain cutoff point, while the IDI inspects the overall improvement of the model.
After selecting the best model, we applied the variables of the model to all the subjects in this study for internal validation to ensure the predictive ability of the model.

RESULTS
This study involved 4,335 T2DM patients, including 1,831 (42.24%) male participants and 2,504 (57.76%) female participants from seven communities in Shanghai. Among all the included T2DM patients, there were 379 patients (8.74%) with stroke and 3,956 patients (91.26%) without stroke. The average age of the participants was 64.54 ± 6.79 years. The prevalence of stroke among all participants was 8.74% (379 participants). The mean LDL-C and HDL-C levels in patients with stroke were 1.75 ± 0.48 and 1.47 ± 0.36 mmol/L and were 1.48 ± 0.46 and 1.73 ± 0.38 mmol/L separately in those without stroke. The median TG level was 2.00 (1.61, 2.59) mmol/L in patients with stroke and 1.24 (0.81, 1.91) mmol/L in those without stroke. The median HbA 1c and FBG levels of T2DM patients with stroke were separately 7.30% (6.60%, 8.30%) and 7.60 (6.35, 9.10) mmol/L, whereas those of patients without stroke were 6.57% (5.87%, 7.57%) and 7  The detailed demographic and clinical characteristics are given in Table 1.
Through the analysis of literature search results and questionnaire results, 23 potential risk factors from physical examination indicators and biochemical examination indicators were included in the LASSO regression analysis (Figures 1A,B).  We selected 11 non-zero characteristic variables in the LASSO regression results, including AGE, course, BMI, SBP, DBP, HbA 1c , TG, LDL-C, HDL-C, UA, and eGFR ( Table 2). For external validation, three models were constructed. Foundation model, composed of basic information indicators and physical indicators, included AGE, course, SBP, DBP, and BMI (Figure 2A). Biochemical model consisted of biochemical indexes, including HbA 1c , TG, LDL-C, HDL-C, UA, and eGFR ( Figure 2B). Integrated model contained all the variables of the above two models (Figure 2C). To give a plain and clarified illustration of integrated model, an example of a T2DM patient demonstrated in Figure 2D.  Table 7), whereas in validation set ( Figure 3H) correspondingly were 0.836 (Figure 3B), 0.832 (Figure 3D), and 0.909 ( Figure 3F) ( Table 7). Calibration plot indicated that S:P of foundation model, biochemical model, and integrated model in training set is 0.836 (Figure 4A), 0.754 (Figure 4C), and 0.621 (Figure 4E), whereas in validation set is, respectively, 0.918 (Figure 4B), 0.682 (Figure 4D), and 0.666 ( Figure 4F). The DCA decision curve demonstrated that the threshold probability of foundation model, biochemical model, and integrated model in training set is 8-73%, 8-98% and ∼8% (Figure 5A), whereas in validation set is 8-70, 8-90, and 8-95% (Figure 5B).
Through calculating the NRI, the cutoff in the training set was 0.088 (0.804, 0.785) (Figure 3E). Integrated model demonstrated to be 0.131 better than foundation model ( Figure 6A) and 0.113 better than biochemical model ( Figure 6C) ( Table 8). In the validation set, the cutoff was 0.087 (0.811, 0.853) (Figure 3F). Integrated model was 0.133 better than foundation model ( Figure 6B) and 0.118 better than biochemical model (Figure 6D)     variables of integrated model and the variables proved to have a fairly good ability of predicting risk of stroke among T2DM patients. The result has been showed in Table 9.
Based on the results, the integrated model was confirmed to have moderate predictive ability. To better aid prevention and treatment of T2DM patients with stroke clinically and in the community, we developed an online application that could predict quickly and directly. The URL of the application is https:// doctorhu.shinyapps.io/T2DM_Stroke_DynNomapp/.

Prevalence of Stroke, Differences in Clinical Characteristics, and Medication Conditions of T2DM Patients
The prevalence of stroke in T2DM patients was 8.74%, and those in training set and validation set were, respectively, 8.73 and 8.77% in the study, which were consistent with some other previous studies. In a national observational cohort study in Sweden, in 26,380 T2DM patients, 6.5% were diagnosed with a stroke with the stroke incidence rate of 10.12 events 1,000 personyears (17). A study including multivariate analysis conducted in Spain found 41.2% T2DM patients with atherothrombotic stroke and 35.1% with lacunar infarction (18). Shen et al. (19) performed a retrospective cohort study composed of 27,113 blacks and 40,431 whites with T2DM and found that 8,496 (12.57%) participants developed stroke during a mean followup period of 3 years. A Chinese study was conducted on 9,374 T2DM patients in total to establish a risk score system; among all the participants, 11.48% developed ischemic stroke with a mean follow-up of 8 years (12). Xuebing et al. (20) performed a study in Beijing, China, on 4,639 T2DM patients, and among all the subjects, the prevalence of stroke was 5.5%.
The biochemical indicator characteristics of the general population in this study, T2DM patients with stroke, were generally higher in the levels of clinical indicators, including LDL-C, TG, HbA 1c , FBG, SBP, and DBP than those of patients without stroke, and HDL-C level was lower among patients with stroke, which were consistent with other studies. A study on Chinese T2DM patients indicated that LDL-C and TG were higher in patients with CVD, and HDL-C was lower than those without CVD (21). A study exploring risk factors of ischemic stroke on 2,769 DM patients found that the mean HbA 1c and FBG levels were significantly higher in patients with stroke when compared with patients without stroke (22). A Taiwanese study of 16,994 T2DM patients demonstrated that compared with those without stroke, patients with stroke were higher in the prevalence of hypertension with a rate of 74.5% (23). During this study, more than two-thirds of patients took antihypertensive drugs, and nearly a third of patients use aspirin. A case-control study conducted in 32 countries/regions indicated that the occurrence of stroke is related to hypertension (24). A systematic review also showed that lowering blood pressure can significantly reduce various baseline blood pressure levels and vascular risk of complications (25). In our study, there were 2,702 T2DM patients without stroke taking antihypertensive drugs, accounting for 68.30% of all patients without stroke, which showed that taking antihypertensive drugs has significance on controlling blood pressure and then reducing the stroke incidence. According to a review comprehensively including randomized controlled trials of aspirin therapy, it is estimated that aspirin would reduce the risk of myocardial infarction and stroke by ∼10% in DM patients, indicating that low-dose aspirin therapy (75-162 mg) would be reasonable for DM patients in the primary prevention for stroke (26).

Risk Factors for T2DM Patients With Stroke
We utilized the nomogram in the study. A nomogram is a superior visual tool with the user-friendly display, precise calculation, and easy to understand and effective prognoses (27), which is expert in developing a graphic continuous scoring system based on incorporated related factors and calculating precisely the risk probability of adverse results according to individual characteristics (28). In terms of all the bright points, the nomogram was applied for predicting the risk of stroke incidence among T2DM patients and clinical evaluation and displayed decent predictive power through internal and external validation.
Eleven risk characteristic variables considered as factors affecting stroke incidence among T2DM patients in this study, including age, course, SBP, DBP, HDL-C, LDL-C, BMI, TG,   variables showed the best predictive ability through NRI and IDI validation, which displayed the necessity of each of the 11 risk factors in predicting the risk of stroke among T2DM patients.  A risk study on T2DM patients with stroke obtained 14 risk factors, among which four risk factors, including age, disease course, blood pressure, and HbA 1c level, were consistent with this study (12). According to the results, this study suggested that age and the course of diabetes in T2DM patients are important and immutable predictive risk factors for T2DM patients with stroke.
Old age means the decline of the function of various tissues and organs of the body, pointing out that the risk of T2DM patients with stroke is affected by age (1). A study of 3,776 T2DM subjects identified age as an important risk factor (29). As the age of T2DM patients with stroke continues to increase, with the decline of physical function and the prolongation of the duration of diabetes, blood glucose fluctuations are obvious, exacerbating vascular endothelial damage and inflammatory stimuli, thereby accelerating the formation of stroke (30). Khalid Al-Rubeaan et al. (22) performed a study on ischemic stroke and its risk factors in a diabetic cohort in countries facing diabetes prevalence and showed the prevalence of ischemic stroke was 4.42% and was higher in the older age group with longer diabetes duration.
The result of this study illustrated that there was a positive correlation between BMI, TG, and stroke prevalence in T2DM patients. High BMI and TG indicate that patients are obese, having a higher possibility of blood lipid status. According to the American Heart Association, American Stroke Association, and many other global guidelines, maintaining a healthy weight is recommended as an important intervention for stroke outbreaks. The BMI, as an important measure of physical health, plays an important role in preventing the onset of disease in the brain of diabetic patients. For a cohort including patients with first-ever stroke, higher BMI was confirmed as an independent indicator for long-term survival according to a randomized controlled trial-based study on the effect of interventions targeting risk factors prevention (31). The study has shown that BMI has an impact on stroke risk in diabetic patients (11). A study of Chinese patients with T2DM showed that TG is a risk factor for stroke in T2DM patients and that female's elevated TG levels are more likely to be the risk factor to cause strokes than those of males (32). During the literature search, it was found that the results of some studies on BMI risk factors pointed out that the BMI of patients with type 2 diabetes was negatively related to the risk of stroke (33), which was consistent with the same results we obtained according to the available data.
The result of our study showed that there is a significantly positive association between the prevalence of stroke and blood pressure in patients with DM. The study has indicated that high blood pressure is the factor leading to increased stroke risk (34). A meta-analysis of randomized controlled trials comparing the effects including BP lowering on cardiovascular outcomes of DM patients concluded that BP-lowering treatment would significantly reduce cardiovascular risk in DM patients (35). According to the Journal of the American Heart Association, different from the cutoff point (BP ≥140/90 mmHg) for the diagnosis of hypertension in non-diabetic population, the  (37). Systolic blood pressure is one of the main diagnostic indicators of hypertension.
Hypertension is the basis of arteriosclerosis, which can cause endothelial hyperplasia, sclerosis, vascular stenosis, and even occlusion. It is for this reason that strokes eventually occur. A study on high blood pressure showed that SBP and DBP are related to the occurrence of stroke (38). A review summarizing evidence mainly based on randomized controlled trials for the effect of BP management on the primary and secondary prevention of stroke determined that adequate BP lowering is of great significance and is expected to bring benefits for stroke prevention (39). Therefore, it is necessary to control SBP and DBP among T2DM patients. HbA 1c is a parameter of sugar, indicating the 2-or 3-month mean level of blood glucose control and has a close link with the risk of diabetic complications (40). According to the result, glycemic control is essential as a preventable measure of stroke incidence for its influence on T2DM patients. A prospective cohort study conducted on 563 qualified T2DM patients showed that HbA 1c could affect the development of microvascular complications (41). A study in Pakistan that worked on the difference of HbA 1c values among diabetics and non-diabetics with stroke demonstrated that HbA 1c level was higher in the diabetic group (42). Through a Swedish study of 406,271 T2DM patients in total, T2DM patients were proved to have a higher risk of stroke and death with a lack of proper glycemic control, measured by the HbA 1c index (17). A systematic review including meta-analysis indicated that a rising HbA 1c level would be associated with the elevated risk of first-ever stroke, with average hazard ratios (95% CI) among DM cohorts of 1.17 (1.09, 1.25) as HbA 1c increased 1% (43). According to a study in Thailand based on T2DM patients with and without ischemic stroke, the risk of ischemic stroke would be raised 7.9-10.9 times with HbA 1c of 8-8.9% and higher (44).
Both LDL-C and HDL-C were considered as risk factors affecting stroke incidence of T2DM patients based on the result. A population-based retrospective cohort study on 144,271 Chinese T2DM patients found control of LDL-C was considerably related with 42% reduction of CVDs and should be given priority for treatment in primary care (45). Based on extensive clinical trials, a meta-analysis showed that the incidence rate of stroke among T2DM patients decreased by 21% with LDL-C level decreasing by 1 mmol/L (38.7 mg/dL) (46). High-density lipoprotein cholesterol is known for its antithrombotic influencing platelets, endothelial cells, and the blood coagulation-fibrinolysis system (47) and as a prevention factor of atherosclerosis. A meta-analysis on data of 61 studies indicated a strong association between HDL-C cholesterol and high risk of CVD and death (48). Through a retrospective cohort study and a mean follow-up of 3 years, including 67,544 T2DM patients, a significant adverse connection was found between HDL-C cholesterol among T2DM patients and the risk of total, ischemic, and hemorrhagic stroke (19). High-density lipoprotein cholesterol was an influencing factor involved in a Chinese retrospective cohort study aiming at establishing a predictive model of ischemic stroke among T2DM patients (12).
Uric acid is considered as a risk factor affecting the stroke incidence according to the result. Previous studies have shown that T2DM patients with stroke are usually considered to have a high level of serum UA. Through meta-analysis, a Chinese work proved that T2DM patients were vulnerable to cerebral infarction with a high level of serum UA, along with a finding that the UA level among T2DM patients with cerebral infarction was 29% higher than those without the symptom (49). A study on 1,017 non-insulin-independent DM patients with a 7-year follow-up for each patient demonstrated that a high UA level was considerably related to fatal and non-fatal stroke, thus proving the significant association between UA and stroke among T2DM patients (50). A study exploring links between serum UA level and cardiovascular complications in T2DM patients found that the hazard ratio (95% CI) of stroke was 1.19 (1.08, 1.31) with correspondence to every 59 µmol/L increase in UA level, indicating the serum UA level was related to the risk of stroke incidence among T2DM patients (51).
Estimated glomerular filtration rate is the indicator of renal function. In this study, eGFR was proved to be a risk factor of stroke in T2DM patients. Based on the discussions above, in a Roman study, eGFR was found to have a strong negative correlation to UA, thus indicating the association between eGFR and risk of stroke among DM patients in terms of the act of UA on stroke incidence (52). A study implemented in Poland found in a multivariate analysis that eGFR was considered as a risk factor of both diabetic and non-diabetic patients with ischemic stroke (53). A cross-sectional study conducted in Thailand based on 30,423 T2DM patients showed the association between decreased eGFR and increased risk of ischemic stroke, especially for those of eGFR <60 mL/min per 1.73 m 2 (54).

Limitations
However, our study still has a few limitations objectively. First, the number of subjects in our study is insufficient. In this study, all of the subjects were T2DM patients in seven communities in Shanghai, whereas still many patients were unable to participate in this study because of their serious condition. The prediction of risk factors for type 2 diabetes with stroke in other regions of China still requires more data to improve the prediction model. Second, there are relatively few indicators included in our study. Some indicators of lifestyle and socioeconomic factors should also be included in the study, such as smoking, drinking habits, education, income, and medication status (hypertensive drugs and lipid-lowering drugs). Also, we worked on the cross-sectional data without conducting subsequent related investigations. If the patient's indicators are followed up, the accuracy of this prediction model will be improved to a certain extent.
At the same time, current studies on the risk of stroke in T2DM patients in China mainly obtained data of hospitalized patients. There are insufficient epidemiological surveys conducted on T2DM patients in the community. At the beginning of this study, foundation model, biochemical model, and integrated model incorporating different risk factors were established at the step of external verification, which can be used to assess the risk of stroke in T2DM patients. Based on NRI and IDI, model C was finally identified as the best prediction model. That is to say, age, course, BMI, SBP, DBP, HbA 1c , TG, HDL-C, LDL-C, UA, and eGFR are valuable predictors of risk. When applying the nomogram to T2DM patient evaluation, doctors must carry out health education from the perspective of medicine and skills guidance to help patients develop a healthier lifestyle.

CONCLUSION
Based on a survey collecting basic information, physical data, and biochemical indicators of T2DM patients in seven communities in Shanghai, and processing-related data, this study established three predictive models of stroke risk for T2DM patients through risk factor analysis. To effectively apply the prediction model to T2DM patients and meet the needs of community management and clinical practice, tools including ROC, NRI, IDI, and internal verification were implemented in this study to determine the integrated model as the optimal and most accurate model among the three models.

DATA AVAILABILITY STATEMENT
Considering the privacy of patients, if readers have similar research and want to obtain data related to the article, they can contact the corresponding author, the corresponding research data can be obtained with permission.

ETHICS STATEMENT
Shanghai Medical Ethics Society Committee waived the requirement for ethical approval for this study, which won support from Shanghai Municipal Health Commission before it began. The study was in accordance with the China Guideline for Type 2 Diabetes. All the subjects were carefully informed about the protocol and provided written informed consent before their inclusion in the study. This study protected the subject's anonymity. There is no identifiable information in this manuscript. Researchers kept all the questionnaires and signed informed consent forms.

AUTHOR CONTRIBUTIONS
RS was mainly responsible for data acquisition, including questionnaire design, recruitment and training of volunteers for questionnaire survey, communication with community health center affiliated to Shanghai University of traditional Chinese medicine. FH was responsible for the overall framework design of the paper, including experimental ideas, writing methods, data processing, building models, writing code by RStudio, completed community questionnaire recovery, and biochemical index test entry. TZ and HS were mainly responsible for the literature review, data interpretation, and manuscript compilation. All authors revised the manuscript and approved the current version submitted.

ACKNOWLEDGMENTS
Staff in the community health centers and central hospitals were responsible for blood and urine collection and medical testing staff were responsible for sample testing. This study was supported by Community of Huamu, Community of Jinyang, Community of Sanlin, Community of Siping, Community of Yinhang, Community of Daqiao and Community of Jiangpu for the part of questionnaire survey.