Artificial intelligence-enabled electrocardiographic screening for left ventricular systolic dysfunction and mortality risk prediction

Background Left ventricular systolic dysfunction (LVSD) characterized by a reduced left ventricular ejection fraction (LVEF) is associated with adverse patient outcomes. We aimed to build a deep neural network (DNN)-based model using standard 12-lead electrocardiogram (ECG) to screen for LVSD and stratify patient prognosis. Methods This retrospective chart review study was conducted using data from consecutive adults who underwent ECG examinations at Chang Gung Memorial Hospital in Taiwan between October 2007 and December 2019. DNN models were developed to recognize LVSD, defined as LVEF <40%, using original ECG signals or transformed images from 190,359 patients with paired ECG and echocardiogram within 14 days. The 190,359 patients were divided into a training set of 133,225 and a validation set of 57,134. The accuracy of recognizing LVSD and subsequent mortality predictions were tested using ECGs from 190,316 patients with paired data. Of these 190,316 patients, we further selected 49,564 patients with multiple echocardiographic data to predict LVSD incidence. We additionally used data from 1,194,982 patients who underwent ECG only to assess mortality prognostication. External validation was performed using data of 91,425 patients from Tri-Service General Hospital, Taiwan. Results The mean age of patients in the testing dataset was 63.7 ± 16.3 years (46.3% women), and 8,216 patients (4.3%) had LVSD. The median follow-up period was 3.9 years (interquartile range 1.5–7.9 years). The area under the receiver-operating characteristic curve (AUROC), sensitivity, and specificity of the signal-based DNN (DNN-signal) to identify LVSD were 0.95, 0.91, and 0.86, respectively. DNN signal-predicted LVSD was associated with age- and sex-adjusted hazard ratios (HRs) of 2.57 (95% confidence interval [CI], 2.53–2.62) for all-cause mortality and 6.09 (5.83–6.37) for cardiovascular mortality. In patients with multiple echocardiograms, a positive DNN prediction in patients with preserved LVEF was associated with an adjusted HR (95% CI) of 8.33 (7.71 to 9.00) for incident LVSD. Signal- and image-based DNNs performed equally well in the primary and additional datasets. Conclusion Using DNNs, ECG becomes a low-cost, clinically feasible tool to screen LVSD and facilitate accurate prognostication.


Introduction
Heart failure (HF) is a major health issue affecting over 26 million people worldwide. It causes a significant increase in both morbidity and mortality and imposes a financial burden on society (1). Echouffo-Tcheugui et al. have classified left ventricular dysfunction into two categories: left ventricular systolic dysfunction (LVSD) and left ventricular diastolic dysfunction. LVSD is characterized by a reduced left ventricular ejection fraction (LVEF) and is associated with three times the risk of developing overt HF (2). Early identification of individuals with asymptomatic LVSD can lead to effective interventions, such as lifestyle changes, and medications, including angiotensin-converting enzyme inhibitors, angiotensin II receptor blockers, mineralocorticoid receptor antagonists, and beta-blockers (3)(4)(5)(6)(7), which can delay the onset of HF, reduce the rate of cardiac events, and improve survival (8)(9)(10).
The most commonly used method to assess LVSD is the transthoracic echocardiogram (TTE), but its limitations, including portability, cost, and operator dependency, restrict its use as a screening tool. To address this, there is a need for more accurate and accessible screening tools to identify LVSD in asymptomatic patients, such as a weighted scoring model incorporating clinical characteristics and plasma natriuretic peptides. However, these tools lack the specificity to predict LVSD in asymptomatic populations (11,12).
The electrocardiogram (ECG) is an inexpensive and widely available method that measures the collective electrical activity of the heart and may contain information related to LVSD. While ECG recording is a standardized process, the accuracy and consistency of human interpretation can vary widely based on the experience and expertise of the interpreter. In addition, subtle ECG features that are invisible to the human eye may be useful for LVSD detection and prognostication. To overcome these challenges, the use of deep neural networks (DNNs) is proposed.
In recent years, DNNs have been applied successfully in the healthcare industry, including image analysis (13), predictive modeling (14), natural language processing (15), and drug discovery (16). They are superior to traditional pattern recognition methods (17) and form the foundation of clinical applications such as fracture detection (18), retinopathy grading (19), and lung nodule identification (20). DNN tools can interpret ECGs with similar accuracy to experienced physicians. Attia et al. developed a DNN-based ECG screening tool to identify individuals with LVEF ≤35% (21). A subsequent pragmatic clinical trial showed that a DNN-based intervention increased the likelihood of identifying patients with low LVEF during routine primary care (22). However, the effectiveness of DNN-based models in predicting incident LVSD and mortality has not been studied in a large clinical setting.
With data from approximately 1.7 million individuals, we conducted this study to evaluate the feasibility of using DNN-based ECG interpretation as a screening tool for LVSD and to assess its utility in risk assessment. The primary outcome was the ability of the DNN model to accurately identify individuals with LVSD (defined as LVEF <40%) based solely on the ECG. The secondary outcome was the ability of the DNN model to identify individuals at increased risk of death and at increased risk of developing LVSD.

Data sources and study population
This study was conducted at Chang Gung Memorial Hospital (CGMH), the largest private hospital system in Taiwan. The study population included consecutive adult patients (age ≥ 18) who underwent standard 12-lead ECG at CGMH between October 2007 and December 2019 (1,777,039 individuals,5,148,718 ECG tracings). ECGs with poor recording quality or unavailable leads were excluded. The ECG data were linked to the Chang Gung Research Database (CGRD), which included the electronic health records of all patients who visited any one of the following seven hospitals: Keelung, Taipei, Linkou (headquarters), Taoyuan, Yunlin, Chiayi, and Kaohsiung.
The patients' survival status was confirmed by linking the CGRD to the National Death Registry. Valid internal patient record linkage was achieved by using unique patient identifiers, and these were encrypted before the data were released to researchers to protect patient confidentiality. This study was approved by the Institutional Review Board of CGMH and Tri-Service General Hospital. This study used anonymous and nontraceable data, so the need for patient consent was waived.

Collection of data
Standard 12-lead ECGs with 10-s voltage-time traces were acquired at a sampling rate of 500 Hz using a MAC 5000, MAC 5500, or MAC5500HD ECG machine (GE Healthcare, Chicago, IL, United States) and stored using the Marquette Universal System for Electrocardiography (MUSE). Each standard 12-lead ECG was stored as a 12 × 5,000 matrix. Both the raw ECG signal data and processed ECG images at a 400 × 600-pixel resolution were obtained. Transthoracic echocardiograms were performed and interpreted in accordance with the guidelines set forth by the American Society of Echocardiography and the American College of Cardiology/ American Heart Association. Comprehensive two-dimensional (2D) or three-dimensional (3D) Doppler echocardiographic profiles and quantitative measurements were recorded in Chang Gung's health information system. For this study, we only extracted LVEF values for analysis. LVEF was routinely measured using standardized methodologies. If different methods were used to measure LVEF in a report, the order of data preference was as follows: 3D echocardiogram, the Simpson biplane method, 2D method, linear measurement using M-mode. If multiple LVEF values were obtained using one method, the mean value was used for analysis.
To achieve proper correlation between ECG and TTE data, only TTEs obtained within 2 weeks of the index ECG were used for DNN model creation.

Development of DNN models for identification of LVSD
In this study, we implemented two types of DNNs using the Pytorch framework and Python 3.6. All training was performed on an NVIDIA DGX-1 platform with 8 V100 GPUs and 32 GB of RAM per GPU. For the DNN that used signal inputs (DNN-signal), we used the deep residual network (ResNet) (23) modified to fit the signal input (Supplementary Figure 1). We used a wider kernel for the first convolution layer compared with the original ResNet framework as used for images. This architecture used skip connections, which allowed information to pass directly to the next layer to avoid the degradation caused by deeper neural networks. The network consisted of a convolution layer followed by eight residual blocks. Each residual block contained two convolution layers. The output of the last block was fed into hybrid pooling because combining max-and average-pooling methods improved the generalization ability while reducing dimensionality (24,25). The output of hybrid pooling was subsequently sent to a fully connected layer to perform the final classification. The output of each convolutional layer was followed by batch normalization for distribution normalization and fed into a rectified linear activation unit (26). Cross-entropy loss with an Adam optimizer (27) was used in the model. Dropout was applied to reduce the overfitting by breakup co-adaptation on the training data (28).
For the DNN using the image inputs (DNN-image), we prepared a 400 × 600-pixel image similar to standard 12-lead ECG images (Supplementary Figure 2) using the signal data (12 × 5,000 matrix). The resolution was determined by a series of experiments using different image resolutions. The images were fed to ResNet-18 (23), and the output layer had two classes (Softmax function). The validation set was used to optimize the network architecture and network hyperparameters. The DNN-signal and DNN-image used the same training and validation sets for model building and were tested on the same testing set. A receiver operating characteristic (ROC) curve was plotted to assess the performance. The model with the highest area under the ROC curve (AUROC) was selected as the final model. We used the validation dataset ROC to select optimal threshold for the probability of LVSD by applying the Youden index (J) method.
We further assessed the network performance in different age, sex, and comorbidity strata. The odds ratio (OR), sensitivity, and specificity were calculated for each strata.

Division of dataset
Among 1,684,298 adult patients with ECG tracings, 380,675 had at least one TTE data within 2 weeks of the index ECG during the study period ( Figure 1). For patients with multiple ECG-TTE pairs, the earliest pair with the shortest ECG-TTE interval was selected for model development. Total 380,675 ECG-TTE paired datasets were used for the primary analysis. These ECG-TTE pairs were randomly allocated into a training, validation, or testing set using simple random sampling in which each dataset had an equal probability of selection without replacement. The final DNN development cohort included 133,225 patients in the training set, 57,134 in the validation set, and 190,316 in the testing set. No patient was allocated to more than one group ( Figure 1).
We further conducted an external validation using paired ECG-TTE data from the Tri-service General Hospital. The external validation cohort included 91,425 consecutive adults between April 2010 and September 2021. The criteria of patient selection and echocardiographic performance methodology were the same as for the derivation cohort. Different from the ECG machine used at CGMH, ECGs from Tri-service General Hospital were obtained using the Philips system.

Performance evaluation of the DNN models in predicting mortality
The ability of DNN to predict all-cause and cardiovascular mortality was assessed. According to the differences between the results of echocardiographic measurements and DNN predictions, we defined the following names: (i) 'true positive' DNN prediction represents both DNN-predicted and echo-measured LVEF <40%; (ii) 'true negative' DNN prediction represents both DNN-predicted and echo-measured LVEF ≥40%; (iii) 'false positive' DNN prediction represents DNN-predicted LVEF<40% and contemporaneous echomeasured LVEF ≥40%; and (iv) 'false negative' DNN prediction represents DNN-predicted LVEF≥40% and contemporaneous echomeasured LVEF <40%. The associations of different groups with all-cause or cardiovascular mortality were also assessed. The National Death Registry was linked to the study dataset. In Taiwan, it is mandatory for physicians to report deaths and causes of death to the Department of Health and Welfare. Therefore, death records within the National Death Registry are considered complete and accurate. A previous validation study estimated the effect of the misrecorded causes of death in the National Death Registry on cardiovascular mortality rates. The effect was less than 4%, suggesting accurate causeof-death coding in Taiwan (29).

Sensitivity analyses
We conducted sensitivity analyses in patients who were not included in the primary analysis. These patients were included in the Frontiers in Cardiovascular Medicine 04 frontiersin.org following sub-analyses ( Figure 1): (i) among patients with multiple TTE examinations in the original testing dataset (dataset A1, n = 49,564), the incidence of LVSD and mortality were compared in patients with 'falsepositive' versus 'true-negative' predictions of LVSD; (ii) among patients who underwent TTE after more than 2 weeks of the index ECG (dataset B), the incidence of LVSD and mortality were compared in patients with positive versus negative predictions of LVSD; and (iii) among patients without echocardiographic data (dataset C), mortality rate was compared in patients with positive versus negative predictions of LVSD. Age-and sex-weighted Kaplan-Meier analysis was used to determine the incidence of LVSD or mortality. Cox proportional hazard regression was used to estimate the age-and sex-adjusted hazard ratios (HR; 95% confidence intervals [CI]) for LVSD and mortality.

Statistical methods
Only the testing datasets were evaluated for performance measures. The model's diagnostic performance was evaluated by calculating the AUROC, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The F1 score, harmonic mean of the PPV, and sensitivity based on the selected threshold were also computed. Continuous variables are expressed as means ± standard deviation (SD). Categorical variables are expressed as numbers and percentages. Adjusted odds ratios (OR; 95% CI) were calculated. For comparisons of population characteristics, the chi-square test was used for categorical variables and the unpaired Student's t-test for continuous variables. Cox proportional hazards models were used to estimate hazard ratios (HR; 95%CI) for LVSD, all-cause, and cardiovascular mortality. A value of p < 0.05 was considered statistically significant. Statistical analyses were conducted using SAS 9.4 software.

Results
The testing dataset contained 190,316 patients (46.3% females), and 8,216 patients (4.3%) had LVSD. The mean age was 63.7 ± 16.3 years. The median follow-up time was 3.9 years (interquartile range 1.5-7.9 years) for testing dataset. Table 1 shows the characteristics of the patients in the training, validation, and testing sets. There were no significant differences between groups.

Performance of the DNN models in identifying LVSD
The AUROC values of DNN-signal and DNN-image for identifying LVSD in the testing dataset were 0.95 and 0.94, respectively (Supplementary Figure 3). When selecting a threshold maximizing the Youden's index, the overall accuracy of DNN-signal was 0.86, with a  Tables 1, 2 show the patient characteristics and the performance of DNN-signal using data from Tri-service General Hospital.

Performance of the DNN models in predicting mortality
Age-and sex-weighted Kaplan-Meier curves for mortality of patients with DNN signal-predicted LVSD and echo-derived LVSD are shown in Figure 3. A total of 8,216 LVSD patients were identified using echocardiographic data, and 33,535 LVSD patients were identified using DNN-signal. DNN signal-predicted LVSD was associated with age-and sex-adjusted HRs (95% CI) of 2.57 (2.53-2.62) for all-cause mortality and 6.09 (5.83-6.37) for cardiovascular mortality at a median follow-up of 3.9 years. Echo-derived LVSD was associated with age-and sex-adjusted HRs (95% CI) of 2.68 (2.60-2.76) for all-cause mortality and 7.79 (7.39-8.22) for cardiovascular mortality. The DNN-image performed similarly to DNN-signal with age-and sex-adjusted HRs (95% CI) of 2.70 (2.66-2.75) for all-cause mortality and 6.47 (6.19-6.77) for cardiovascular mortality (Supplementary Figure 4).  Deep neural network sensitivity, specificity, and odds ratio for detecting LVSD across different subgroups. The neural network's sensitivity and specificity for detecting LVSD is tabulated across subgroups. The odds ratio (OR), which is the ratio of the positive ratio     Frontiers in Cardiovascular Medicine 08 frontiersin.org (6.51-7.14) for cardiovascular mortality. Supplementary Figures 5-12 show Kaplan-Meier curves for incident LVSD, all-cause and cardiovascular mortality for subsets A1, B, and C.

Discussion
The prevalence of LVSD ranges from 2 to 8% in adults depending on the study population and cut-off value used (8-10). In both symptomatic and asymptomatic cases, LVSD is associated with increased morbidity and mortality. The Framingham cohort study showed that individuals with asymptomatic LVSD (LVEF <40%) have around eight-fold increased risk of developing HF (30). The combination of definite treatment and primary prevention of incident HF can reduce the disease burden. One such strategy is to screen for asymptomatic LVSD; however, the best method for this is unclear (11,31,32). Our study demonstrated the potential of DNNs for screening asymptomatic LVSD. In addition, comprehensive real-world testing demonstrated the robustness of DNN to identify LVSD and patients at risk of future LVSD and mortality. Furthermore, we constructed DNN models based on both raw ECG signals and transformed images. In clinical settings in which raw ECG signals are not available, this method can digest ECG image tracing and provide similar performance. Consequently, the applicability of DNN-enabled ECG is broadened.
ECG is a ubiquitous and economical point-of-care diagnostic tool in cardiology. Previous research has demonstrated that LVSD might be characterized by specific ECG changes, such as Q-waves (33, 34), left bundle branch block (35), and wide QRS duration (>120 ms) (36). However, no single feature had high enough predictive value to offer clinical utility. These various features seemed to interact in a non-linear fashion that could not be accounted for by traditional statistical methods or algorithmic approaches. DNNs afford the ability to consider complex datasets in the context of all of the contained data rather than preselected discrete data elements. Identifying these features may offer novel findings that can provide new diagnostic approaches or therapeutic targets. Finding ways to understand what drives the network's interpretation is also the direction of future efforts. We used DNN algorithms to perform binary classification of LVEF in a hospital-based population, with excellent performance (AUROC, 0.95) superior to known screening tests (e.g., natriuretic peptides) (11). The DNN performed well across all age, sex, and comorbidity groups ( Figure 2). In addition, the model performance was validated externally using data from the Phillips system, suggesting its robustness across different machine types. The diagnostic performance was characterized by a high NPV, which helps exclude LVSD with high confidence. The 'false positive' rates were high. However, we further demonstrated that 'false positive' DNN predictions were associated with an eight-fold increased risk of incident LVSD (confirmed by TTE), a two-fold increased risk of all-cause mortality, and a five-fold increased risk of cardiovascular mortality compared to 'true negative' DNN predictions. This means that DNN could detect early, subclinical, electrical or structural abnormalities shown on the ECG. These abnormalities may include cardiac arrhythmias, left ventricular deformation, valvular heart disease, or metabolic derangements and thus increase the risk of LVSD incidence and death. In this case, DNN-enabled ECG is an effective screening tool to identify patients at risk.
Several studies have demonstrated the potential of AI in turning ECGs into functional screening and diagnostic tools for various heart disorders. For instance, Mayo Clinic researchers have applied AI to automatically detect LVSD and even tried to identify atrial fibrillation through sinus rhythm. Compared with prior studies (21, 37), we not only verified the diagnostic effectiveness of AI-assisted ECG reading on LVSD screening, but also explored the use of ECGs as an outcome prediction tool with the assistance of AI. Individuals with a positive DNN prediction were associated with a two-fold increased risk of all-cause mortality and a six-fold increased risk of cardiovascular mortality at a median follow-up of 3.9 years. This finding suggested that some trivial electrical abnormalities due to metabolic or myocardial disturbances may precede LVSD. It was speculated that some of these disturbances might be irreversible or progressive, eventually causing long-term adverse effects.
While this study reveals that DNN-enabled ECG interpretation is a reliable method of detecting LVSD, the selection of target populations for screening remains to be addressed. Galasko et al. evaluated a variety of LVSD screening strategies and demonstrated that LVSD screening is more cost-effective in high-risk subjects than in the general population (38). High-risk subjects were defined as those with hypertension, diabetes, atherosclerotic cardiovascular disease, and heavy alcohol consumpton (39). Our research included individuals who visited the hospital for various reasons, not just for known heart disease. This hospital-based population did have higher prevalences of diabetes mellitus (28.2%), hypertension (53.6%), and coronary heart disease (7.6%), which fits the definition of a high-risk group.
Based on this study, we propose a prototype approach for in-hospital LVSD screening.
Step one involves ECG screening using the DNN-enabled classification of individuals who will undergo highrisk invasive treatment or those with pre-existing cardiovascular risk.
Step two involves TTE evaluation of individuals identified as abnormal by DNN models. This DNN-enabled screening strategy offers an advantage, as ECG machines and internet services are widely available in modern hospitals, and the strategy is also financially sustainable. This DNN model also provides a potential complementary care approach to plasma natriuretic peptide measurement for primary LVSD screening. Further studies are needed to assess the impacts of the proposed DNN-enabled screening strategy on the incidence and prognosis of in-hospital HF-associated adverse events. Furthermore, a comprehensive analysis may be conducted to examine the costeffectiveness of the proposed strategy.
In summary, DNN-enabled ECG is a valuable tool to screen for LVSD and predict outcomes. Given the low cost of DNN-enabled ECG, serial screening is possible, which also helps optimize screening strategy for LVSD without using invasive laboratory testing, particularly in settings with limited medical resources.

Limitations of the study
There are several limitations to this study. First, some of the LVEF data used for analysis were measured using M-mode way. The major limitation of M-mode is its one dimensional nature and lack of direct spatial information. When regional LV deformation exists, the M-mode-derived LVEF is not reliable. Although most operators choose the 2D or 3D methods when performing LVEF measurements in patients with structural heart disease, we cannot completely rule out this potential bias. Second, echocardiographic parameters other than LVEF, such as left ventricular diameter, left ventricular diastolic function, right ventricular function or valvular heart disease, also affect mortality risk. However, the present study did not introduce these parameters to analyze and evaluate their impact on prognosis. Further research should be conducted to assess the differences between clinical characteristics of patients with DNN-predicted LVSD compared to those without DNN-predicted LVSD. Third, the study was conducted in an academic medical center in patients with more complex diseases. The primary analysis consisted of patients with a higher prevalence of HF and other cardiovascular comorbidities, whom clinicians identified as needing a TTE evaluation. Considering these cohort characteristics, the findings may not be generalizable to relatively healthy and truly asymptomatic populations. To verify the generalizability of our DNN models, we conducted multiple additional analyses in more than 1 million patients with different clinical characteristics. In addition, the stratified analysis of patients without known comorbidities showed a similar performance of the models. Finally, although the sensitivity and specificity were both satisfying in our study, we observed a relatively lower PPV. The performance of PPV is highly correlated to the proportion of positive subjects in the testing group. The low likelihood of LVSD (4.3%) in testing dataset caused a low PPV. Despite this, an appropriate sensitivity is more critical in applying ECG as an LVSD screening tool. The purpose of this screening tool is to detect all potential subjects who are at risk of developing LVSD for following echocardiogram exams.

Conclusion
The established DNN algorithms in this study enable rapid LVSD detection and represent an essential step in transforming the ECG into an effective, real-time screening tool. Its ability to predict LVSD incidence and long-term mortality may help stratify patient risk and initiate relevant interventions. With good accuracy and accessibility, DNN-enabled ECG has the potential to optimize the screening process for LVSD among at-risk populations and to advance HF care significantly.\ Frontiers in Cardiovascular Medicine 10 frontiersin.org

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement
The studies involving human participants were reviewed and approved by the Chang Gung Medical Foundation-Institutional Review Board. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.  Chang Gung Memorial Hospital (grant number CLRPG3H0013, CORPG3L0161, and CORPG3L0461). We were also given methodological assistance from the University of Nottingham.