Automated identification of atrial fibrillation from single-lead ECGs using multi-branching ResNet

Introduction: Atrial fibrillation (AF) is the most common cardiac arrhythmia, which is clinically identified with irregular and rapid heartbeat rhythm. AF puts a patient at risk of forming blood clots, which can eventually lead to heart failure, stroke, or even sudden death. Electrocardiography (ECG), which involves acquiring bioelectrical signals from the body surface to reflect heart activity, is a standard procedure for detecting AF. However, the occurrence of AF is often intermittent, costing a significant amount of time and effort from medical doctors to identify AF episodes. Moreover, human error is inevitable, as even experienced medical professionals can overlook or misinterpret subtle signs of AF. As such, it is of critical importance to develop an advanced analytical model that can automatically interpret ECG signals and provide decision support for AF diagnostics. Methods: In this paper, we propose an innovative deep-learning method for automated AF identification using single-lead ECGs. We first extract time-frequency features from ECG signals using continuous wavelet transform (CWT). Second, the convolutional neural networks enhanced with residual learning (ReNet) are employed as the functional approximator to interpret the time-frequency features extracted by CWT. Third, we propose to incorporate a multi-branching structure into the ResNet to address the issue of class imbalance, where normal ECGs significantly outnumber instances of AF in ECG datasets. Results and Discussion: We evaluate the proposed Multi-branching Resnet with CWT (CWT-MB-Resnet) with two ECG datasets, i.e., PhysioNet/CinC challenge 2017 and ECGs obtained from the University of Oklahoma Health Sciences Center (OUHSC). The proposed CWT-MB-Resnet demonstrates robust prediction performance, achieving an F1 score of 0.8865 for the PhysioNet dataset and 0.7369 for the OUHSC dataset. The experimental results signify the model’s superior capability in balancing precision and recall, which is a desired attribute for ensuring reliable medical diagnoses.


Introduction
Cardiovascular diseases have been the leading cause of mortality globally.The World Health Organization (WHO) states that about 17.9 million people perish due to cardiovascular disease each year, contributing 32% to the worldwide death toll.Atrial fibrillation (AF) is the most common cardiac arrhythmia caused by uncoordinated electrical activities in the atria.Although AF itself does not lead to a lethal condition, it will substantially increase the risk of catastrophic diseases any preassumptions.The performance of the proposed framework is evaluated by two real-world datasets: PhysioNet/CinC challenge 2017 [8,9] and ECG data obtained from the University of Oklahoma Health Sciences Center (OUHSC).Experimental results show that our CWT-MB-ResNet significantly outperforms existing methods commonly used in current practice.
The rest of this paper is organized as follows: Section 2 presents the literature review of existing data-driven methods for AF detection.Section 3 introduced the data processing details and the proposed prediction method.Section 4 shows the experimental results in AF identification.Section 5 concludes the present investigation.

Research Background
Traditional machine learning approaches focus on the extraction of ECG morphological features [10] and heart rate variability information [11] to identify AF conditions.Those methods are mostly in light of two aspects of AF-altered ECG characteristics: (1) the absence of P wave or fibrillatory P waves presented as oscillations in low amplitude around the baseline [12]; (2) irregular R-R intervals [13].Multiple feature-based automation techniques have been proposed to classify AF-altered ECGs, such as linear discriminant analysis [10], support vector machine [14,15], independent component analysis [16].When there exists a high level of noise or faulty detection, the performance of AF identification methods that solely study the P wave deteriorates significantly due to the chaotic signal baseline introduced by the noise [17].Most R-R interval-based methods [18,19] usually require long ECG segments to detect AF episodes, and become ineffective when it comes to short ECG signals (less than 60s) or in the presence of significant sinus arrhythmia or frequent premature atrial contractions [20].Moreover, traditional methods require a separate feature extraction process before feeding the data into the classifier, as well as manually establishing the detection rules and threshold.This can be computationally expensive and may not generalize well when applied to a larger population.
In the past few decades, deep learning or deep neural network (DNN) has emerged as a powerful tool for pattern recognition that can learn the abstracted features from complex data and yield state-of-the-art predictions [21,22,23,24,25].As opposed to traditional machine learning, deep learning presents strong robustness and fault tolerance to uncertain factors, which makes it suitable for beat and rhythm classification from ECG data [26].Moreover, existing research has indicated that deep learning methods demonstrate more efficient and more potent predictive power than classical machine learning methods for AF identification [27,28].Various neural network structures have been proposed to address the heart disease detection problems such as recursive neural networks (RNNs) [29], long short-term memory (LSTMs) [30], and autoencoders (AEs) [31].
Among them, CNN models have wide applications in AF detection.CNNs commonly excel at 2D data processing such as image-based classification.Recent literature has also shown that 1D-CNNs generate superior predictive results compared to traditional DNNs and RNNs when processing 1D ECG signals [32,26,33].1D-CNN stands as an effective tool to extract the morphological features and learn the slit temporal variations in time series data [26,34].However, existing literature [35,36] has demonstrated that the predictive accuracies produced by 1D CNNs are lower than their structure-alike 2D counterpart in ECG classification due to more comprehensive information in the 2D input data and the more superior capability of 2D CNN in feature extraction and interpretation.
Owing to its outstanding performance and strong ability in pattern recognition, 2D CNN has been explored for ECG classification by virtue of its capacity to smartly suppress measurement noises and extract pertinent feature maps using convolutional and pooling layers [37].For example, Izci et al. [38] engaged a 2D CNN model to investigate ECG signals for arrhythmia identification.They segmented the ECG signals into heartbeats and directly converted each heartbeat into grayscale images, which served as the input of the 2D CNN model.Similarly, Jun et al. [39] proposed to combine 2D CNN and data augmentation with different image cropping techniques to classify 2D grayscale images of ECG beats.However, these end-to-end 2D CNNs are directly fed with original ECG beat segments without considering the possible noise contamination.Moreover, the 2D input data were created by directly plotting each ECG beat as a grayscale image with unavoided redundant information residing in the image background.This procedure requires extra storage space for training data and increases the computational burden without extracting critical features inherent in the ECG beats.
ECG signals generally consist of various frequency components, which can be used to identify disease-altered cardiac conditions.Wavelet transform (WT) [40,41,42] has been proven to be a useful technique for extracting critical time-frequency information pertinent to disease-altered ECG patterns [43,44].As such, WT is favored as a feature-preprocessing procedure that converts 1D ECG signals into 2D images containing time-frequency features.The resulting 2D feature images then serve as the input of CNNs for ECG classification instead of the original 2D ECG plots.For instance, Xia et al. [20] engaged the short-term Fourier transform (STFT) and stationary wavelet transform to convert ECG segments into 2D matrices which were then fed into a three-layer CNN for AF detection.Wang et al. [45] combined the time-frequency features extracted by Continuous Wavelet Transform (CWT) and R-interval features to train a 2D CNN model for ECG signal classification.Wu et al. [46] built a 2D CNN based on time-frequency features of short-time singlelead ECGs extracted from three methods, i.e., STFT, CWT, and pseudo Wigner-Ville distribution, to detect arrhythmias.Huang et al. [37] developed an ECG classification model by transforming ECG signals into time-frequency spectrograms using STFT and feeding them into a three-layer 2D CNN.Li et al. [47] included three different types of wavelet transform (i.e., Morlet wavelet, Paul wavelet, Gaussian Derivative) to create 2D time-frequency images as the input data to the 2D CNN-based ECG classifier.
In addition to effective information extraction from ECG time series, the realization of the complete data potential is heavily reliant on advanced analytical models.Although the abovementioned works have validated the superiority of 2D CNN-based approaches, the shallow network structures with a limited number of layers can potentially hinder the extraction of deeper features.Naturally, the capacity for a neural network to learn is enhanced by an increase in the number of layers.
However, having a deeper network structure can result in the gradient dissipation problem, which impedes convergence during network training, leading to suboptimal prediction performance.To cope with this issue, the residual neural network (ResNet) has been developed with an important modification, i.e., identity mapping, induced by the skip connection technique [48], which has wide applications in classifying the ECG signals.For example, Jing et al. [49] developed an improved ResNet with 18 layers for single heartbeat classification.Park et al. [50] used a squeeze-andexcitation ResNet with 152 layers and compared the model performance trained by ECGs from a 12-lead ECG system and single-lead ECG data.Guan et al. [51] proposed a hidden attention ResNet to capture the deep spatiotemporal features using 2D images converted from ECG signals.
Automated ECG classification also suffers from the long-standing issue of imbalanced data in machine learning.Diverse sampling and synthetic strategies have been proposed to address the imbalanced data issue, which focus on creating a balanced training dataset out from the original imbalanced data to mitigate the potential bias introduced by imbalanced data distribution during model training [52].Frequently employed techniques consist of random over-sampling and under-sampling, informed adaptive undersampling, and synthetic minority over-sampling technique (SMOTE) [53,7,54].For example, Luo et al. [55] engaged SMOTE to synthesize minority samples and create a balanced training dataset for automated arrhythmia classification.Ramaraj et al. [56] incorporated an adaptive synthetic sampling process into the training of deep learning models built with gated recurrent units to address the class imbalance problem for ECG pattern recognition.
Nurmaini et al. [34] compared sampling schemes of SMOTE and random oversampling with RNN and concluded that the balanced dataset created by SMOTE significantly improved the classification performance.In addition to fabricating balanced ECG datasets, Gao et al. [53] and Petmezas et al. [57] proposed to engage dynamically-scaled focal loss function to suppress the weight of loss corresponding to the majority class, so that their contribution to the total loss is reduced to alleviate the class imbalance problem.However, this method requires the preassumption of a focusing parameter to modulate the effect of the majority class on the total loss.Existing methods mainly focus on using sampling and synthetic strategies or modifying the loss function, little has been done to create new network structures without making extra assumptions and feature engineering to cope with the imbalanced data issue in AF identification from ECG signals.

Dataset
In this study, two AF databases from different sources, i.

ECG Signal Preprocessing
Note that the original ECG recordings from OUHSC are in PDF format, as shown in Fig. 1(a).
It is necessary to accurately extract the numerical ECG readings from the PDF files for further data preprocessing and analysis, which is achieved by the following procedure: • Transforming PDF files into gray-scale images represented by 2D pixel matrices: We discretize the 2D image into a pixel matrix.Then, each pixel is converted to a fixed number of bits to represent the gray-scale intensity of the corresponding point in the image.As shown in Fig. 1(a), the ECG signals are displayed in the darkest color on the plot with the color intensity of 1, i.e., h(m, n) = 1, while the grid lines appear in a lighter color, i.e., 0 < h(m, n) < 1, where h(m, n) denotes the color intensity of the pixel at column m and row n.Note that the background color intensity is 0.
• Removing grid lines from the ECG plot: We replace the pixel shade values of the grid lines with the background color value: i.e., h(m, n | h(m, n) < 1) = 0.This allows the ECG signals to distinguishably stand out, as illustrated in Fig. 1(b).The quantized image is thus encoded into a binary digital format, i.e., black as "1" and white as "0".As such, the entire ECG image is transformed into a binary digital matrix without the grid lines.
• Extracting the digital ECG time series: The positions of the black pixels (i.e., ECG signal) in the binary matrix are further extracted, which are represented mathematically as a set of (m, n) pairs: The resulted S is then used to reconstruct the digital ECG time series, where m stands for the time course, and n correspond to the magnitude of the ECG signal.As such, we are able to extract the ECG recordings from the PDFs to digitalized ECG time series signals (Fig. 1(c)), which will be used for further processing and model training.
Raw ECG recordings are often contaminated by noises, such as baseline wandering, electromyography disturbance, and power-line interference [58], which will negatively impact the information extraction and model performance.In this work, we engage BioSPPy, a toolbox for biosignal processing written in Python, for ECG signal denoising.The BioSPPy library provides comprehensive functions for processing ECG signals including functions for importing ECGs, filtering out interfering components, and correcting baseline wandering [59].Specifically, after loading the ECG data, a high-pass filter is applied to remove the low-frequency noise (e.g., baseline wandering), a notch filter to remove power-line interference, and a low-pass filter to filter out the high-frequency noise.

Continuous Wavelet Transform
ECG signals encompass multiple feature components in both the time and frequency domains.
In this study, we engage the continuous wavelet transform (CWT) to extract time-frequency features from ECGs due to its excellent performance in the analysis of transient and non-stationary time series signals [60].CWT is the most popular tool for time-frequency analysis that reflects the  frequency components of data changing with time.CWT is verified to outperform the traditional STFT due to its ability to provide multi-resolution decompositions of the signal, which allows for a trade-off between time and frequency resolution, i.e., higher frequency resolution for signals with sharp transients and higher time resolution for signals with slow-varying frequency content [61].
Additionally, compared to discrete wavelet transform (DWT), CWT remedies non-stationarity and coarse time-frequency resolution defects and supports the extraction of arbitrarily high-resolution features in the time-frequency domain [62].
The CWT of the ECG time-series signal denoted as x(t) is achieved according to: where T (a, b) stands for the intensity of transformed signals, ψ(•) is the wavelet basis (also known as the mother wavelet), a is the scale factor quantifying the compressed or stretched degree of a wavelet, and b is the time shift parameter defining the location of the wavelet.The scale can be used to derive the characteristic frequency of the wavelet as [46]: where F c is the center frequency of the mother wavelet and f s is the sampling frequency of the signal.This relationship shows that smaller (larger) values of a correspond to higher (lower) frequency components.In CWT, the mother wavelet plays a critical role in time-frequency analysis, the choice of which depends on its similarity with the original signal [63].Here, the Mexican hat wavelet (mexh) is chosen to serve as the mother wavelet because its shape is similar to the QRS waves and it is commonly used in ECG signal analysis [45].Specifically, the mexh is the second derivative of a Gaussian function [62], which is defined as: understand the features that distinguish AF from normal heart rhythms and make more accurate predictions.

Convolutional Neural Network
We engage CNN to build a data-driven classifier for differentiating AF samples from normal ECG samples.CNN is a type of neural network architecture specifically designed to process data that has a grid-like structure such as images [64].As opposed to traditional multilayer perceptron networks (MLPs), where the input of each neuron consists of the outputs of all the neurons from the previous layer, the neuron in CNN only receives its input from a localized region of the previous layer, known as its receptive field.The main building blocks of a CNN are convolutional layers, pooling layers, and fully connected layers.Convolutional layers are responsible for performing a convolution operation on the input data, using a set of filters to extract local features in the data, and producing a feature map that summarizes such local information.Let θ and X denote the filter (also known as the kernel) and the input.The convolution operation works as follows: where s 1 and s 2 denote the size of the 2D kernel, and (i, j) denotes the location on the 2D input (e.g., image).After being applied with the activation function, the feature map of the input is obtained as [65,49]: where X l q is the q-th feature at layer l, X l−1 p is the p-th input feature map of the previous (l − 1)th layer, σ denotes the activation function to induce the non-linearity in the functional mapping, and b q represents the bias.This procedure is repeated by applying multiple filters to generate an arbitrary number of feature maps to capture different characteristics of the input.Note that kernels are shared across all the input positions, which is also called weight sharing, the key feature of CNN.
The weight-sharing technique guarantees the extracted local patterns are translation invariant and increases computational efficiency by reducing the model parameters to learn compared with fully connected neural networks.
The pooling layer mimics the human visual system by combining the outputs of multiple neurons (i.e., clusters) into a single neuron in the next layer, effectively creating a condensed representation of the input.The pooling significantly reduces the spatial resolution and only focuses on the prominent patterns of the feature maps, making the network more robust to small translations and distortion in the input data [20].Popular pooling techniques include maximum pooling, average pooling, stochastic pooling, and adaptive pooling, which are typically performed on the values in a sub-region of the feature map [66].
The fully connected layers form a dense network that can learn complex non-linear relationships between the inputs and outputs.It takes the output of the previous layer, which is typically a high-dimensional tensor containing discriminant features extracted by convolutional and pooling layers, and flattens it into a one-dimensional vector.This vector is then used as the input to the fully-connected layer.The fully-connected layer is similar to an MLP in that every neuron in one layer is connected to every neuron in the next layer.By using a proper activation function, the neural network is able to produce classification decisions [34].By stacking these building blocks (convolutional layers, pooling layers, and fully connected layers) in various combinations, CNN is able to learn complex features in the input data, allowing them to effectively solve a wide range of image and signal processing tasks [67].

2D CNN with ResNet
We propose to engage 2D CNN to investigate the 2D time-frequency scalograms converted from denoised ECG signals by CWT for AF identification.It has been demonstrated that the substantial depth of the convolutional network is beneficial to the network performance [68] .However, as the number of convolutional layers increases, the training loss stops further decreasing and becomes saturated because of the gradient dissipation issue.As such, a CNN with a deeper architecture, counterintuitively, sometimes incurs a larger training error compared to its shallow counterpart upon convergence [48].In order to solve such network degradation and gradient vanishing problems, the residual network (ResNet) has been developed to improve the accuracy of CNNs with considerably increased depth.
The core of ResNet is the residual learning technique [48].Specifically, instead of using the stacked convolutional layers to directly fit the underlying mapping from the input to the output, ResNet focuses on fitting a residual mapping.Fig. 3 shows a ResNet building block with input X and its corresponding output mapping Y .The residual block engages a shortcut connection that bypasses one or more convolutional layers and allows the information to flow directly from the input to the output.As such, the input X is added to the output of the block F (X) (enclosed by the dashed circle in Fig. 3, allowing the network to learn the residual mapping represented as Y = F (X) + X instead of learning the direct mapping as Y = F (X).This design mitigates the gradient vanishing problem and allows for deeper networks to be trained effectively.
In our study, we engage the ResNet with 18 layers (ResNet18) to build the AF classifier because ResNet18 has been proven to be able to generate a comparable result with a faster convergence compared to a deeper counterpart [48].Fig. 4 shows the detailed structure of ResNet18.Note that the notation of 2DConv(n input , n output , n f dim1 × n f dim2 ) denotes that, in the current 2D convolutional layer, there are n input input channels, n output output channels (i.e., number of filters) with the 2D filter size of n f dim1 × n f dim2 .For example, (64, 128, 3 × 3) indicates that this convolutional layer is composed of 128 filters with the filter size of 3 × 3 applied on the input data with 64 channels.

Multi-Branching Convolutional network
Data-driven identification of AF from ECG recordings generally suffers from imbalanced data issues.The obtained ECG signals contain far more normal samples than AF in both the PhysioNet and OUHSC datasets, with imbalanced ratios of approximately 7:1.Here, we propose to incorporate a multi-branching (MB) technique into ResNet18 (MB-ResNet) [7] to relieve the imbalanced data  In the current investigation, we aim to identify AF samples from normal ECG samples.The neural network is expected to produce high probabilities (close to 1) for AF samples and low probabilities (close to 0) for normal ECG samples.We choose the binary cross-entropy as the loss function for MB-ResNet, which is defined as: where ω denotes the neural network parameter, X j and y j stand for one input sample and its corresponding true label respectively, I(•) denotes the indicator function, N d is the total number of the training samples, and Pi ω; X j represents the predicted probability for AF at the i-th branching output given the input signal X j .

ResNet18
Balanced sub-dataset 1 Balanced sub-dataset 2 Balanced sub-dataset 3 ......The adaptive momentum method (Adam) [69] is adopted to minimize the loss function and update the neural network parameters.In the inference stage, the MB network generates N b predictions for AF probability, which correspond to the N b branching outputs.The final predicted probability for AF ( P ) is determined by taking the average of the N b outputs: where Pi is the predicted probability of i-th branching output.

Experimental Design
We validate and evaluate the performance of the proposed CWT-MB-ResNet framework using both Physionet Challenge and OUHSC datasets.The experiment design is shown in Fig. 6.
We compare the performance of our CWT-MB-ResNet with 1D-CNN (Fig. 6  curve towards the top-right corner.A higher area under PRC (AUPRC) value suggests a more effective model.The F1 score quantifies the equilibrium between a model's precision and recall for a binary classifier system by computing their harmonic mean, which is defined as Note that the F1 score ranges from 0 to 1, where a score of 1 indicates a perfect balance between precision and recall.the MB structure to effectively cope with the imbalanced data issues.

Conclusions
In this paper, we develop a novel framework based on Continous Wavelet Transform (CWT) and multi-branching ResNet for AF identification.We first transform the 1D ECG time series into e., ECG recording from PhysioNet/CinC challenge 2017 and ECG PDFs from OUHSC, are used to evaluate the performance of data-driven detection methods.Both databases consist of short single-lead ECG recordings for AF and non-AF patients.PhysioNet/CinC Challenge 2017 is an open database including 8528 single-lead ECG signals and their annotations.Among them, 5050 ECG recordings are labeled as normal sinus rhythm while 738 signals are annotated as AF.The sampling frequency of recordings is 300 Hz and the duration of ECG signals varies from 9s to 30s.The OUHSC database contains ECG signals in PDF format with 33 recordings from AF subjects and 227 normal samples, which are annotated by cardiologists from OUHSC.Each recording has a duration about 30s with a sampling frequency of 60 Hz.We use 80% of the total data for training and the remaining 20% for testing for both databases.

Figure 1 :
Figure 1: An example of (a) a raw image recording of an ECG segment in PDF format, (b) the ECG image that filters out the grid background, (c) the digitalized ECG time series signal.
Fig. 2 (a-b) and (c-d) show the healthy and AF examples of the raw ECG signals obtained from PhysioNet and their 2D time-frequency patterns after CWT transformation with mexh wavelet, respectively.The colors in the scalogram indicate the energy density of the signal component at the corresponding frequency and time[62,44].According to Fig.2(a) and (c), two general differences can be observed: 1) The AF ECG signal lacks a distinct P wave, while it shows a fast and chaotic F wave due to the atrial fluttering (Fig.2 (c)), in comparison to a normal ECG signal (Fig.2 (a)); 2) Irregular RR intervals are observed in AF ECG (Fig.2 (c)) caused by a non-synchronized ventricular response to the abnormal atrial excitation[44].The discriminative information in the time domain can also be captured by the CWT scalograms shown in Fig.2(b) and (d).By using a 2D CNN to analyze the visual representation of 2D time-frequency scalograms, we can better

Figure 2 :
Figure 2: (a) The raw ECG signal from Physionet labeled as normal and (b) its corresponding 2D CWT scalogram.(c) The raw ECG signal from Physionet labeled as AF and (d) its corresponding 2D CWT scalogram.Note that the RR intervals are different in the AF sample and irregular F waves (circled) appear in (c).

Figure 3 :
Figure 3: A building block of the ResNet.

Figure 5 :
Figure 5: Illustration of the multi-branching architecture.

Fig. 8 Figure 8 :
Fig.8displays the ROC and PR curves of all four models using the OUHSC dataset.The 2D ResNet models (i.e., CWT-ResNet and CWT-MB-ResNet), which use 2D scalograms transformed from ECG signals as the input, produce a larger area under the curves (both ROC and PRC) compared to their 1D counterparts (i.e., 1D-CNN and 1D-MB-CNN).This demonstrates the efficacy of using the CWT to extract time-frequency features in the ECG signal analysis.Additionally, the models with an MB architecture (i.e., 1D-MB-CNN and CWT-MB-ResNet) produce a larger area under both the ROC and PR curves compared to models without MB outputs (i.e., 1D-CNN

Fig. 9
Fig. 9 further shows the ROC and PRC analysis for the Physionet/Cinc 2017 challenge dataset.Similar to the results from the OUHSC dataset, the 2D ResNet models (CWT-ResNet and CWT-MB-ResNet) with 2D scalograms of ECG signals as the input, outperform their 1D counterparts (1D-CNN and 1D-MB-CNN) in terms of the area under the ROC and PR curves.Furthermore, the MB-based models (1D-MB-CNN and CWT-MB-ResNet) effectively account for the imbalanced data issues, exhibiting better performance compared to the non-MB-based models (1D-CNN and CWT-ResNet).Table. 2 demonstrates the comparison of AUROC, AUPRC, and F1 scores provided by

Figure 9 :
Figure 9: The comparison of (a) ROC and (b) PRC between different models using data from Physionet/Cinc 2017 challenge.
2D time-frequency scalograms to avoid the aliasing of multiple frequency components, which can serve as the input to the 2D CNN-based classifier.Second, we leverage the ResNet architecture to cope with the possible gradient dissipation problems in deep 2D CNN and increase the effectiveness of network training.Moreover, a multi-branching architecture is incorporated into the ResNet to mitigate the possible prediction bias caused by the imbalanced data issue.Finally, we implement the proposed CWT-MB-ResNet to predict AF using the ECG recordings from PhysioNet/CinC Challenge 2017 and the ECG PDFs from OUHSC.Experimental results show that the proposed CWT-MB-ResNet achieves the best prediction performance for both datasets in AF detection.The CWT-MB-ResNet framework has great potential to be applied in clinical practice to improve the accuracy in ECG-based diagnosis of heart disease.

Table . 1
shows AUROC, AUPRC, and F1 scores generated from the four methods using the OUHSC dataset.Our CWT-MB-ResNet method generates the best AUROC, AUPRC, and F1 scores with the values of 97.15%, 86.73%, and 0.8155.Note that the MB technique demonstrates its effectiveness on both 1D-CNN and CWT-ResNet as the AUROC, AUPRC, and F1 scores provided by the MB-based neural network models are higher than their non-MB-based counterparts.

Table 1 :
The comparison of AUROC, AUPRC, and F1 scores generated from 1D-CNN, 1D-MB-CNN, CWT-ResNet, and CWT-MB-ResNet using OUHSC data. of 97.41%, 93.53%, and 0.8865.Especially, our CWT-MB-ResNet model improves the F1 score by 46.2% percent compared to the pure 1D-CNN with no CWT transform or MB structure.We further compare the proposed CWT-MB-ResNet model with existing studies in the literature that also used deep learning models for AF classification using the dataset from the PhysioNet/CinC challenge 2017.Table.3summarizes the comparison results in terms of F1 score.Our proposed CWT-MB-CNN demonstrates the best F1 score compared with the other framework settings and

Table 3 :
The comparison of F1 scores between the proposed CWT-MB-ResNet method with existing literature using data from Physionet/CinC 2017 challenge.This is due to the fact that our CWT-MB-ResNet not only leverages CWT to capture comprehensive 2D time-frequency features from ECG signals but also incorporates