A sensitivity indicator screening and intelligent classification method for the diagnosis of T2D-CHD

Background The prevalence of Type 2 Diabetes Mellitus (T2D) and its significant role in increasing Coronary Heart Disease (CHD) risk highlights the urgent need for effective CHD screening within this population. Despite current advancements in T2D management, the complexity of cardiovascular complications persists. Our study aims to develop a comprehensive CHD screening model for T2D patients, employing multimodal data to improve early detection and management, addressing a critical gap in clinical practice. Methods We analyzed data from 699 patients, including 471 with CHD (221 of these also had T2D) and a control group of 228 without CHD. Employing strict diagnostic criteria, we conducted significance testing and multivariate analysis to identify key indicators for T2D-CHD diagnosis. This led to the creation of a neural network model using 21 indicators and a logistic regression model based on an 8-indicator subset. External validation was performed with an independent dataset from an additional 212 patients to confirm the models’ generalizability. Results The neural network model achieved an accuracy of 90.7%, recall of 90.78%, precision of 90.83%, and an F-1 score of 0.908. The logistic regression model demonstrated an accuracy of 90.13%, recall of 90.1%, precision of 90.22%, and an F-1 score of 0.9016. External validation reinforced the models’ reliability and effectiveness in broader clinical settings. Conclusion Our AI-driven diagnostic models significantly enhance early CHD detection and management in T2D patients, offering a novel, efficient approach to addressing the complex interplay between these conditions. By leveraging advanced analytics and comprehensive patient data, we present a scalable solution for improving clinical outcomes in this high-risk population, potentially setting a new standard in personalized care and preventative medicine.


Introduction
Type 2 diabetes (T2D) is a significant global health issue, currently affecting over 573 million individuals worldwide.This number is projected to increase, highlighting the urgency for effective management strategies to mitigate associated complications, including coronary heart disease (CHD) (1,2).T2D is a major risk factor for the development of CHD, primarily due to the chronic hyperglycemic state it induces.This state exacerbates oxidative stress and inflammation, leading to damage of the coronary endothelium and the subsequent development of CHD (3).
Patients with T2D are 2-4 times more likely to develop CHD than those without diabetes, and they often present with more severe manifestations of the disease (4).This increased risk underscores the importance of early and accurate screening for CHD in the diabetic population.Despite the advancements in managing T2D, patients continue to face significant risks of cardiovascular complications.The American College of Cardiology notes that diabetic patients with acute coronary syndrome (ACS) are at a higher risk of severe post-percutaneous coronary intervention (PCI) complications, such as target lesion revascularization, myocardial infarction, and cardiovascular death, despite aggressive glucose-lowering strategies (5).
Moreover, large-scale clinical trials like ACCORD, ADVANCE, and VADT have highlighted the complexity of managing cardiovascular risks in patients with T2D, revealing that while strict glycemic control can reduce microvascular complications, it may not necessarily translate to a reduced risk of macrovascular events, such as CHD (6).This paradox further emphasizes the need for a proactive approach in screening for CHD among patients with T2D.
The diagnosis of CHD largely depends on coronary angiograms, a vital tool for visualizing the heart's blood vessels.However, in patients with both CHD and T2D, CHD symptoms may be masked by the neuropathic effects of high blood glucose levels, complicating the clinical presentation (7).Our research focuses on developing a comprehensive screening model for CHD among individuals with T2D.This model aims to utilize a wide array of multimodal data, including detailed patient histories, clinical examinations, and laboratory tests, to facilitate early detection and intervention for CHD in this high-risk population.By enhancing early detection efforts, we aspire to significantly improve patient outcomes and navigate the complexities associated with the co-management of T2D and CHD, thereby addressing a critical gap in current clinical practice.

Patient data collection
This research is grounded in a retrospective analysis, leveraging a comprehensive patient dataset from the Department of Cardiovascular Medicine at Xuzhou Medical University Affiliated Hospital.The dataset spans records over the last four years, including individuals diagnosed with Coronary Heart Disease (CHD) and Type 2 Diabetes Mellitus (T2D), aiming to develop an intelligent diagnostic model.This model capitalizes on a diverse array of clinical and biochemical indicators for its analysis.

Diagnostic criteria
CHD was confirmed via coronary angiography, indicating stenosis exceeding 50% in at least one major coronary artery or over 70% in its primary branches.
T2D diagnosis followed the American Diabetes Association criteria, characterized by fasting plasma glucose levels exceeding 7.0 mmol/L or plasma glucose concentrations above 11.0 mmol/L two hours post-oral glucose tolerance testing (OGTT) (4).

Exclusion criteria
Patients with severe valvular heart disease or non-ischemic myocardial disease were excluded from the study.

Clinical and biochemical data inclusion
The study categorized examination indicators into clinical symptoms, biochemical markers, and ECG indicators, encompassing variables such as age, gender, duration of diabetes, history of hypertension, lipid profiles, and various ECG abnormalities.
For the biochemical indicator and ECG feature data, since the numerical distribution intervals of multiple indicators vary greatly, the min-max strategy is used for normalization processing.If the original data is distributed in the interval min , max, transform it to the interval min 0 , max 0 , the formula is as follows: For clinical status information, the collected indicator information can be structured and quantified to form a standardized numerical vector.

Ethical considerations
The study adhered to stringent ethical guidelines, with approval from the Medical Ethics Committee of Xuzhou Medical University Affiliated Hospital (Ethical Approval Number: XYFY-2022-0217) and Xuzhou Central Hospital (Ethical Approval Number: XCH-20240202).All patient data were anonymized and analyzed with utmost confidentiality to protect privacy.

Sensitive indicator screening based on significance testing 2.2.1 Univariate analysis for preliminary screening
The initial patient pool included 471 individuals with CHD, among which 221 were also diagnosed with T2D (T2D-CHD), and 250 had CHD exclusively (CHD-only).Additionally, 228 individuals with T2D but without CHD (T2D-only) were incorporated for comparative analysis.A thorough univariate examination of each clinical and biochemical indicator was conducted to narrow down the extensive list of potential biomarkers to a more focused subset.

Sensitive indicator combination screening
In this step, we integrated indicators into a discriminant model based on their individual significance.This process was refined through iterative testing to identify an optimal set of indicators for disease diagnosis.To thoroughly evaluate the impact of these indicator combinations on diagnostic outcomes, we employed several statistical measures: (1) Logistic Regression Coefficient (B): This metric quantifies the relationship between each independent variable and the outcome, with its sign indicating the direction of correlation.The coefficient is determined as follows: (2) Odds ratio (OR), a statistical measure of the strength of association between two variables.OR can be calculated as: (3) Wald, is a chi-square statistic used to assess the effect of the independent variables on the dependent variable in a multiple indicator regression model.Wald can be calculated as: where SE B is the standard error of B. (4) P-value, a statistic used to evaluate the results of the Wald test.
The smaller the p-value, the more significant the association between the independent and dependent variables.The P-value can be calculated as: where f(Á) denotes the normal distribution function.

Intelligent diagnostic model development and initial verification 2.3.1 Neural network classification model
The neural network classification model, with a fully connected double-hidden-layer architecture, showcases strong self-organizing and nonlinear mapping capabilities, depicted in Figure 1.A dataset of 699 samples-221 T2D-CHD, 250 CHD-only, and 228 T2Donly-with 68.2% for training and the remainder for validation (Table 1), it uses 21 indicators to distinguish between patient groups effectively.
To optimize performance and mitigate overfitting, we employed a 25% Dropout regularization in the second hidden layer and selected cross-entropy loss, ideal for classification due to its precision in assessing output-label discrepancies (Table 2).The Mini-Batch Gradient Descent (MBGD) algorithm facilitated training, focusing on minimizing cross-entropy loss to improve accuracy (Table 3).
Hyperparameter tuning followed a phased strategy, starting with broad adjustments to refine our search for the optimal hyperparameter set.This methodical tuning, alongside performance evaluation of each parameter set, guaranteed enhanced model performance on both training and validation datasets.

Logistic regression classification model
The logistic regression classification model, designed to diagnose T2D-CHD, incorporates L2 regularization within a generalized linear regression framework to mitigate overfitting.This model estimates the probability of T2D-CHD occurrence by applying the logistic function to linear combinations of selected indicators, optimizing its coefficients and bias vector through supervised learning.The basic model is: where, the coefficient matrix W and the bias vector B are the parameters to be solved, which are determined by supervised learning on the sample set.Logistic regression uses the logistic function L to correspond W Á X þ B to the probability p of the occurrence of a hidden state, and then determines the value of the dependent variable based on the size of p and 1 À p, that is, whether it is T2D-CHD.

Evaluation metrics
The logistic regression model's evaluation will also comprehensively encompass Accuracy, Recall, Precision, and the The architecture of the neural network classification diagnostic model.The model features a fully connected double-hidden-layer network structure, designed to enhance self-organizing learning and provide strong nonlinear mapping capabilities.F-1 Score.These added metrics, indispensable for evaluating the model's efficacy in accurately diagnosing T2D-CHD cases, will provide a more detailed view of the model's overall diagnostic performance.
(1) Accuracy represents the proportion of true results (both true positives and true negatives) among the total number of cases examined.It is the most intuitive performance measure and it gives an overall effectiveness of the model.Accuracy is calculated as follows: Where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.(2) Recall, also known as sensitivity, measures the proportion of actual positive cases that were correctly identified.It is particularly important in medical diagnostics, where failing to identify a condition (false negative) can be more critical than incorrectly diagnosing it (false positive).Recall is defined as: (3) Precision assesses the proportion of positive identifications that were actually correct.It is crucial in situations where the cost of false positives is high.Precision is given by: (4) The F-1 Score is the harmonic mean of Precision and Recall, providing a balance between the two metrics.It is especially useful when the class distribution is imbalanced.The F-1 Score is calculated as:

External validation
To enhance our statistical analysis and model validation, we expanded our dataset through collaboration with Xuzhou Central Hospital, incorporating data from an additional 73 patients with both T2D and CHD (T2D-CHD), 69 patients with CHD but without T2D (CHD-only), and 70 individuals with T2D but without CHD (T2D-only).This effort was aimed at increasing the dataset's diversity and improving the generalizability of our findings.
The external dataset underwent the same preprocessing and evaluation protocols as the initial dataset, ensuring a consistent and rigorous assessment.

Statistical analysis
Data analysis in our study was conducted using IBM SPSS Statistics 26, covering both descriptive and inferential statistics to address our research questions comprehensively.After preprocessing the data for quality, we employed descriptive statistics to summarize the data distribution.Inferential analysis followed, with one-way ANOVA and the Kruskal-Wallis H test applied to continuous variables to compare group means or medians based on normality checks.For categorical variables, the chi-square test evaluated the significance of distributions across groups.These methods collectively enabled the identification of significant disease indicators and their interactions, underpinning the development of our intelligent diagnostic model.

Univariate analysis
Univariate analysis identified crucial indicators for diagnosing T2D in conjunction with CHD.Analysis results, as outlined in Table 4, indicated significant disparities in key indicators among three groups: those with T2D and CHD, those with CHD alone, and those with T2D only.Notable indicators included age, heart rate, HbA1c levels, fasting blood glucose, total cholesterol (TC), HDL-C, LDL-C, troponin I, creatinine, uric acid, albumin, γ-glutamyl transferase (GGT), total bilirubin, apolipoprotein A1, apolipoprotein B, T-wave changes, and ST-segment changes.The history of hypertension, aspartate aminotransferase (AST), and alanine aminotransferase (ALT) showed no significant variance, suggesting their limited value in differentiation.

Selection results of significant indicator combinations
A combination of 8 significant indicators was pinpointed through multifactorial association and significance analysis, demonstrating strong correlation with T2D and CHD: HbA1c, fasting blood glucose, HDL-C, apolipoprotein B, total bilirubin, T-wave change, ST-segment change, and heart rate, detailed in Table 5.These findings form a solid foundation for our diagnostic models.

Intelligent model typing diagnostic results
Employing 21 sensitive indicators from the univariate analysis and the 8 selected from multifactorial analysis, two diagnostic models were developed: a neural network and a logistic regression model, aimed at classifying T2D with CHD cases.

Neural network diagnostic results
The neural network model was assessed using a test set, with its classification performance depicted in Figure 2. The model distinguished T2D-CHD, CHD-only, and T2D-only groups with notable accuracy, though some outliers were observed.Figure 3 presents ROC curves for each classification indicator, showcasing discriminative capabilities with AUC values of 0.961, 0.964, and 0.977, respectively.Performance metrics, listed in Tables 6, 7, include an accuracy of 90.7%, recall of 90.78%, precision of 90.83%, and an F-1 score of 0.908, highlighting its diagnostic precision.

Logistic regression diagnostic results
The logistic regression model, applying the 8 chosen indicators, showed proficiency in disease classification, as detailed in Tables 8, 9.It accurately classified the majority of cases across the three categories, achieving an accuracy of 90.13%, recall of 90.1%, precision of 90.22%, and an F-1 score of 0.9016, demonstrating reliable diagnostic capability.

External validation of intelligent model typing diagnostic results
External validation assessed the diagnostic models' generalization capability using data from an independent hospital cohort, which confirmed their effectiveness across T2D-CHD,  CHD-only, and T2D-only patient categories, adhering to stringent analytical standards.

Neural network model external validation
The external validation of the neural network model, encapsulated in its classification confusion matrix and summarized in Tables 10, 11, confirmed its exceptional capability to accurately differentiate among the three patient categories.The model achieved an external validation accuracy of 90.57%, with closely aligned recall, precision, and F-1 scores of 90.62%, 90.55%, and 0.9058, respectively, affirming its robust performance and wide applicability.

Logistic regression model external validation
Parallelly, the logistic regression model's external validation, meticulously detailed in Tables 12,13, illustrated its commendable accuracy and consistency in disease state classification.This model's external validation metrics showcased an overall accuracy of 89.15%, complemented by a recall of 89.23%, precision of 89.12%, and an F-1 score of 0.8917, further reinforcing the diagnostic models' reliability and effectiveness in a broader clinical context.

Discussion
The necessity of managing cardiovascular risk in patients with Type 2 Diabetes Mellitus (T2D) has become increasingly evident, prompting a shift in the approach to cardiovascular disease (CVD) prevention and management (8).The development of risk assessment tools by authoritative bodies like the American College of Cardiology (ACC) and the American Heart Association (AHA) (9), alongside recommendations from the European Society of Cardiology (ESC) (10), signifies a strategic move towards early detection and primary prevention.These tools and stratifications aim to tailor preventive measures and treatments by evaluating individual risk factors such as age, lifestyle, comorbidities, and disease duration.Yet, the variation in methodologies underscores the complexity of accurately assessing CVD risk in T2D patients, highlighting a critical need for consensus and a more personalized approach to care.
The traditional approach for diagnosing Coronary Heart Disease (CHD) in patients with Type 2 Diabetes Mellitus (T2D) primarily utilizes electrocardiograms (ECG) and coronary angiography.ECGs, as a non-invasive tool, are crucial for detecting heart irregularities and ischemic conditions but may not always accurately capture the onset of CHD or may have ischemic changes masked by other diseases or confounding

Training set classification results
Test set classification results factors (11).Coronary CT angiography (CTA) represents a noninvasive early screening method, yet it suffers from a higher false-negative rate, making it challenging to widely implement in primary care settings.Meanwhile, Invasive coronary angiography, despite offering detailed artery visualization, is invasive and carries risks, rendering it less suitable for patients in early or asymptomatic stages of CHD (12).Critically, the neuropathic effects of diabetes can conceal the typical pain associated with heart conditions, often leading diabetic patients to delay seeking medical attention and hindering effective detection and diagnosis (13).This underscores an urgent need for advancements in diagnostic methods that are less invasive, highly sensitive, and customized to individual patients, aiming to enhance the early detection and efficient management of CHD in those with T2D.
In response, our study introduces a pioneering sensitivity index screening method and a rapid diagnostic model for T2D-CHD, leveraging the latest advancements in machine learning (ML) and artificial intelligence (AI) to transcend the barriers posed by current diagnostic methodologies.This approach, drawing upon a comprehensive array of medical data, including patient history, comorbidities, laboratory tests, and ECGs, aims to facilitate early detection and risk stratification of CHD among T2D patients.The inspiration for this approach stems from recent breakthroughs in AI/ML, which have demonstrated significant potential in revolutionizing the diagnosis and management of cardiovascular diseases and diabetes.
For instance, the work of Alimova et al., employing ML algorithms to predict diastolic dysfunction in cardiovascular and diabetic patients, highlights the precision and effectiveness these technologies bring to medical diagnostics (14).This is further supported by research from Saeed and Hama, who explored cardiac disease prediction using AI algorithms (15), and from Chinmayi et al. and Barbieri et al., who delved into AI's role in disease risk prediction and the utilization of advanced imaging techniques for enhanced diagnostic accuracy (16,17).These studies underscore the adaptability and depth of AI/ML in capturing complex cardiovascular and metabolic interrelations, setting a foundation for our methodology.

Limitations and future directions
While our study presents a promising direction for the use of AI in T2D-CHD diagnostics, we acknowledge limitations such as the reliance on limited training sets and potential selection biases inherent in electronic medical records.Future research could extend our findings through multi-center external dataset validation or prospective cohort studies, further refining the diagnostic models and expanding their applicability.

Conclusion
Our work contributes to the evolving evidence base supporting the integration of AI and ML in diagnosing complex diseases like T2D-CHD.Our results, in line with recent advancements, advocate for the potential of AI-based diagnostic models to significantly improve disease screening and management, particularly valuable in primary care settings where early detection can dramatically influence patient care and outcomes.

FIGURE 2
FIGURE 2 Box plot of neural network classification results.This figure displays the classification performance of the neural network model, represented through a box plot.The color blue indicates predicted T2D-CHD cases, red denotes NDM-CHD cases, and green symbolizes Non-CHD cases.

FIGURE 3 ROC
FIGURE 3 ROC curves for classification indicators in neural network model.This figure displays the ROC (Receiver Operating Characteristic) curves for each classification indicator within the neural network model.The areas under the curves for DM, NDM, and NN are 0.973, 0.968, and 0.982 respectively.

TABLE 2
Model parameter settings.

TABLE 3
Training parameter settings.

TABLE 5
Results of multivariate correlation analysis.

TABLE 4
Results of univariate analysis.

TABLE 6
Neural network four-fold cross-validation results.

TABLE 12
Logistic classification confusion matrix.

TABLE 9
Logistic classification results.