ORIGINAL RESEARCH article

Front. Physiol., 08 May 2026

Sec. Computational Physiology and Medicine

Volume 17 - 2026 | https://doi.org/10.3389/fphys.2026.1800941

HPRNet: a hierarchical pyramidal residual network for ECG arrhythmia classification

  • JH

    Jiayan Huang 1,2

  • MH

    Miaomiao Huang 2

  • HZ

    Hanling Zheng 2

  • YX

    Yongyi Xiao 3

  • YB

    Yolanda Bolea 1

  • AG

    Antoni Grau 1

  • SL

    Shaoye Luo 2*

  • 1. Department of Systems Engineering, Automation and Industrial Informatics, Polytechnic University of Catalonia, Barcelona, Spain

  • 2. College of Computer and Data Science, Putian University, Putian, China

  • 3. College of Artificial Inteligence, Putian University, Putian, China

Abstract

Accurate classification of electrocardiogram (ECG) signals plays a critical role in the automated diagnosis of cardiac arrhythmias. However, ECG recordings are often non-stationary and susceptible to various types of noise, which makes robust feature extraction challenging for many existing deep learning models. To address these challenges, this paper proposes a hierarchical pyramidal residual network (HPRNet) for ECG arrhythmia classification. HPRNet incorporates a hierarchical pyramidal REB-based backbone (HRB) to capture multi-scale morphological characteristics of ECG signals. In the HRB, the temporal resolution is progressively reduced while the number of feature channels is gradually increased, allowing the network to effectively learn multi-scale ECG representations. Furthermore, a multi-level pruning optimization (MLPO) strategy is incorporated, including network-level pruning and block-level pruning, to reduce redundant parameters and improve computational efficiency while preserving classification capability. Experiments on two public benchmark datasets show that HPRNet achieves superior performance compared with five representative methods on MIT-BIH, reaching an F1-score of 92.05%, while obtaining 91.98% on the INCART binary classification task with an average inference latency of 0.031 s. Moreover, visualization analysis highlights the intrinsic difficulty of distinguishing challenging beat classes, and ablation studies confirm the effectiveness of the proposed HRB and MLPO. These findings support the robustness of HPRNet for automated arrhythmia classification. The source code is publicly available at: https://github.com/jyanhuang/HPRNet-ECG.

1 Introduction

As a widely adopted non-invasive diagnostic modality, the electrocardiogram (ECG) provides rapid, lowcost, and reproducible measurements of cardiac electrical activity Berkaya et al. (2018). Its waveform patterns reflect instantaneous cardiac electrophysiological states, making ECG indispensable for early detection, risk stratification, and longitudinal monitoring of cardiovascular diseases (CVDs), particularly arrhythmias Wang et al. (2022a). However, early interpretation of ECG signals largely relied on manual examination and physician clinical experience, which could introduce subjectivity in waveform assessment and inter-observer variability. Moreover, ECG waveforms associated with different arrhythmia types often exhibit subtle morphological differences and are susceptible to various noise sources Wang et al. (2022b); Li et al. (2025b), posing significant challenges for reliable and accurate arrhythmia classification. Consequently, automated ECG signal classification has attracted extensive attention as a promising approach to reduce subjectivity in waveform interpretation and improve analysis consistency Ansari et al. (2023); Peng et al. (2024).

Existing ECG-based arrhythmia classification methods can be broadly categorized into traditional machine learning (ML)-based and deep learning (DL)-based methods. Representative ML-based methods include Random Forest (RF) Kropf et al. (2017) and Support Vector Machine (SVM) Adilakshmi (2024). These approaches typically rely on manual feature extraction and selection, which require substantial domain expertise to design effective features, thereby limiting their scalability and classification performance Marzog and Abd (2022). In contrast, DL-based methods, including convolutional neural networks (CNNs) Hannun et al. (2019), recurrent neural networks (RNNs), such as Long-Short-Term Memory (LSTM) Li et al. (2025a), and Transformer-based models Zhang et al. (2022), have demonstrated strong capabilities in ECG analysis. For instance, Zhu et al. (2020) proposed a deep learning framework for realtime multilabel diagnosis of heart rhythm and conduction abnormalities, while Jin et al. (2024) developed an interpretable DL-based model for detecting six common arrhythmias. Despite their promising performance, further improvements largely depend on enhancing the quality of deep feature representations rather than simply increasing network depth or model complexity. In this context, residual learning Choi et al. (2024) provides an effective mechanism for stabilizing deep network training by facilitating gradient propagation and alleviating the degradation problem, thereby improving feature extraction capability when the network architecture is properly designed. However, conventional residual architectures typically rely on straightforward stacking of residual blocks, which may result in excessive parameter growth and redundant computational overhead when applied to ECG signal analysis.

Therefore, to achieve stable training and effective feature extraction under complex ECG conditions, we propose a hierarchical pyramidal residual network (HPRNet) for ECG arrhythmia classification. Unlike conventional residual architectures, HPRNet adopts a modular and hierarchical residual design tailored to ECG signals, enabling the progressive extraction of discriminative temporal and morphological features directly from raw ECG data without requiring explicit denoising preprocessing. Furthermore, to alleviate the computational burden typically associated with hierarchical residual models, a Multi-Level Pruning Optimization (MLPO) strategy is introduced to control model complexity while maintaining classification capability, thereby enabling efficient and practical ECG-based arrhythmia analysis.

The main contributions of this paper are summarized as follows:

  • 1. We propose a hierarchical pyramidal residual network (termed HPRNet) for multi-class ECG arrhythmia classification. HPRNet integrates convolutional feature extraction with a modular hierarchical residual architecture, enabling end-to-end representation learning directly from raw ECG signals without requiring explicit denoising preprocessing.

  • 2. A hierarchical REB-based backbone (HRB) is designed using residual extraction blocks (REBs). Through channel expansion and same-channel downsampling transitions, the HRB progressively enlarges the temporal receptive field while maintaining stable feature propagation, enabling effective modeling of ECG waveform morphology and long-range rhythm dependencies.

  • 3. To improve computational efficiency, a multi-level pruning optimization (MLPO) strategy is introduced. This strategy jointly performs network-level pruning to remove redundant parameters and block-level pruning to control the complexity of REBs, thereby reducing model size and computational overhead.

  • 4. Extensive experiments on two public ECG benchmark datasets (MIT-BIH and INCART) demonstrate the effectiveness of the proposed HPRNet, achieving competitive classification performance while maintaining favorable computational efficiency. Furthermore, visualization analyses are performed to interpret the hierarchical feature representation behavior of HPRNet and to further analyze the challenging characteristics of S and F heartbeat categories.

The remainder of this paper is organized as follows. Section 2 reviews related work on ECG signal classification. Section 3 introduces the proposed method. Section 4 presents experimental results and ablation studies, as well as the limitations of the proposed method. Section 5 concludes the paper and discusses future research directions.

2 Related works

2.1 ECG signal concept and arrhythmia categories

The electrocardiogram (ECG) reflects the physiological and electrical status of the heart and provides essential information for cardiac assessment Abdulla and Al-Ani (2020). On the one hand, temporal intervals derived from ECG waveforms can assist clinicians in determining whether cardiac electrical activity is regular or irregular, as well as abnormally fast or slow. On the other hand, the amplitude and morphology of ECG signals can help indicate whether specific cardiac chambers are enlarged or under excessive workload. As shown in Figure 1, a normal ECG heartbeat consists of several characteristic components, including atrial depolarization (P wave), ventricular depolarization (QRS complex), and ventricular repolarization (T wave).

Figure 1

Abnormal cardiac electrical activity involving disturbed impulse generation or propagation is commonly referred to as arrhythmia and can be reflected in ECG signals. Representative arrhythmia types and abnormal ECG patterns commonly analyzed in ECG studies include supraventricular arrhythmias [e.g., Atrial Fibrillation (AF) and Atrial Flutter (AFL)], ventricular arrhythmias [e.g., Ventricular Fibrillation (VF) and Ventricular Flutter (VFL)], conduction abnormalities [e.g., Left Bundle Branch Block (LBBB) and Right Bundle Branch Block (RBBB)], Premature Atrial Contractions (PAC), and Premature Ventricular Contractions (PVC), Ventricular Escape Beats (VEB), and Paced Beats (PB). These ECG patterns are explicitly annotated in benchmark ECG datasets and are widely adopted in ECG classification research Abdulla and Al-Ani (2020); Özal Yildirim (2018).

2.2 Residual networks for ECG signal classification

Residual learning has been widely adopted for ECG classification and arrhythmia detection, particularly when deeper convolutional architectures are required to capture subtle morphological variations and multiscale temporal patterns in ECG signals. Zhang et al. (2023) presented a multi-class arrhythmia classification method based on a converted multi-scale residual neural network combined with multichannel data fusion. In their approach, extracted ECG features were transformed into a two-dimensional image representation for subsequent processing; however, such a conversion from one-dimensional signals to two-dimensional representations may introduce potential information loss. Khan et al. (2023) proposed an ECG classification model based on residual networks and demonstrated that ResNet outperforms conventional CNNs in deep feature extraction. Qi et al. (2024) proposed a Hybrid Residual Network (Hybrid ResNet) for ECG arrhythmia detection and classification, in which multiple convolutional operations are integrated, potentially increasing the architectural complexity of the network. Liu et al. (2025) proposed a spatio-temporal attention residual network that integrates residual learning with spatio-temporal attention and LSTM modules to model complex dependencies in ECG signals for multi-label classification tasks. Although residual learning effectively facilitates the training of deep networks, most existing studies primarily focus on improving feature representation capability, while comparatively less attention is paid to controlling parameter redundancy within residual blocks. Motivated by this observation, the proposed HPRNet incorporates a pruning strategy directly into the residual extraction blocks to reduce parameter redundancy while preserving the strong feature extraction ability of residual learning.

2.3 Model pruning strategies

The goal of ECG classification algorithms is practical clinical deployment, which imposes strict constraints on model size and computational complexity. Pruning can lower computational cost by eliminating redundant individual weights from the model. Han et al. (2015) adopted magnitudebased pruning by removing weights below a predefined threshold, followed by fine-tuning and regularization using L1 and L2 norms to recover performance. However, such approaches typically rely on post-pruning retraining, which may increase computational cost and limits scalability. Ashkboos et al. (2024) proposed SliceGPT, which reduces model complexity by removing rows and columns in weight matrices to shrink the embedding dimension while maintaining performance. In addition, Ma et al. (2023) investigated a gradient-based structured pruning strategy that selectively removes non-critical coupled structures, achieving effective acceleration while preserving the core functionality of the model. In this work, a non-structured pruning strategy is applied at both the network-level and within residual extraction blocks (block-level) to control parameter redundancy while maintaining classification performance.

3 The proposed method

3.1 Overall structure of HPRNet

ECG signals contain both local morphological patterns and long-term rhythm variations, requiring arrhythmia classification models to capture both local waveform characteristics and global rhythm information Liu et al. (2025). Therefore, we propose a hierarchical pyramidal residual network (HPRNet) for multi-class ECG arrhythmia classification. Specifically, along the depth direction, HPRNet progressively learns hierarchical ECG representations through multiple residual layers. Each residual layer is composed of several residual extraction blocks (REBs). This structure facilitates stable information and gradient propagation while enabling the model to capture ECG features at multiple temporal scales. However, as the network depth increases, the parameter size grows rapidly, leading to higher training complexity and computational cost. To address this issue, HPRNet introduces a multi-level pruning strategy that compresses redundant connections at both the network-level and the block-level, thereby reducing model complexity while preserving representational capacity.

Specifically, as shown in Figure 2 and Table 1, let the input ECG signal be , where L denotes the length of the ECG segment. HPRNet first extracts initial temporal features using two one-dimensional convolutions Conv(·), batch normalization BN(·), and the ReLU activation function δ(·), which can be formulated as Equation 1:

Figure 2

Table 1

StageLayer/ModuleKernel sizeStrideChannels (In→Out)Output size
InputECG beat input11 × 128
Front-endConv1D1711 → 3232 × 128
Front-endConv1D17132 → 3232 × 128
Front-endBatchNorm1D3232 × 128
Front-endReLU3232 × 128
HRBResLayer017132 → 6464 × 128
HRBResLayer117264 → 6464 × 64
HRBResLayer217264 → 128128 × 32
HRBResLayer2_172128 → 128128 × 16
HRBResLayer3172128 → 256256 × 8
HRBResLayer3_172256 → 256256 × 4
HRBResLayer4172256 → 512512 × 2
HRBResLayer4_172512 → 512512 × 1
HRBResLayer5172512 → 10241024 × 1
ClassifierDropout10241024 × 1
ClassifierGlobal AvgPool1D10241024 × 1
ClassifierFully Connected1024 → CC

Layer-wise architecture of the proposed HPRNet network.

The initial feature f0 is then fed into a Hierarchical REB-Based Backbone (HRB) network to progressively learn hierarchical ECG representations, which can be defined as Equation 2:

After obtaining the deep feature map, dropout is applied to mitigate overfitting. Global average pooling (GAP) is then used to aggregate temporal features, followed by a fully connected layer to generate the classification logits, as shown in Equation 3:

where Wf and bf denote the weight and bias of the fully connected layer, respectively. Finally, the predicted class probabilities are obtained using the Softmax function according to Equation 4:

where C denotes the number of arrhythmia classes.

3.2 Hierarchical REB-based backbone

To progressively extract discriminative ECG representations, we design a hierarchical pyramidal REBbased backbone (HRB) for HPRNet. As shown in Figure 2, the HRB backbone is organized as a hierarchical stack of nine residual layers (ResLayers), each composed of multiple residual extraction blocks (REBs). The numbers of REBs in the nine ResLayers are [3, 4, 4, 4, 4, 4, 4, 4, 3], where the shallow and final stages contain fewer blocks, while the intermediate stages adopt more REBs to strengthen feature representation. Between adjacent ResLayers, strided convolutions are used to progressively reduce the temporal resolution while expanding the channel dimension, forming a hierarchical pyramidal representation. In addition, several intermediate stages employ same-channel downsampling layers to further enlarge the receptive field without increasing model complexity. This design allows the backbone to progressively capture high-level semantic information and long-range temporal dependencies in ECG signals.

Specifically, at the inter-layer level, the HRB propagates features across residual layers. Let f(0) = f0 denote the input feature to the backbone. The output of the l-th ResLayer is defined as Equation 5:

At the intra-layer level, each residual layer refines features through a cascade of residual extraction blocks (REBs), which can be expressed as Equation 6:

where denotes the k-th REB in the l-th residual layer and Nl represents the number of REBs in that layer.

3.3 Residual extraction blocks

In this work, multiple residual extraction blocks (REBs) are employed to construct the HRB for capturing subtle discriminative patterns in ECG signals. As illustrated in Figure 2, each REB consists of a main branch and a skip branch. The main branch adopts a pre-activation structure and performs two BN-ReLU-Conv1D operations to progressively extract deeper temporal features. Large-kernel convolution with a kernel size of 17 and padding of 8 is used to maintain temporal resolution while ensuring dimensional consistency for residual addition. Such a kernel size is well aligned with the temporal scale of key ECG morphological components, such as the QRS complex and short-term rhythm variations. For the skip branch, a 1×1 convolution with stride is first applied to downsample the input feature and match the channel dimension of the main branch output, followed by a max-pooling operation to preserve salient temporal responses. Compared with traditional residual blocks using identity or linear projection shortcuts, the proposed pooling-based skip connection introduces a nonlinear selection mechanism that suppresses low-activation responses while retaining important temporal features.

Let the initial input be . The output of the i-th REB in the l-th ResLayer can be expressed as Equation 7:

where denotes the pruned residual mapping, represents the downsampling operation (1 × 1 convolution), and denotes the max pooling operation. This design enables the REB to simultaneously enhance temporal feature extraction while reducing parameter redundancy, thereby improving the efficiency and representational capability of the backbone network.

3.4 Multi-level pruning optimization

Although stacked convolutions are capable of learning rich feature representations, they may also introduce parameter redundancy and increase training complexity in deep hierarchical architectures. To this end, a multi-level pruning optimization (MLPO) strategy is proposed to remove redundant weights while preserving discriminative ECG features.

Specifically, network-level, pruning is applied to the first convolutional layer and the final fully connected layer of HPRNet to remove low-contribution weights in the feature extraction and classification stages. Block-level pruning is embedded into each residual extraction block (REB) by pruning the weights of the first convolution in the main branch, which suppresses redundant intermediate representations introduced by stacked convolutions. The pruning process follows magnitude-based pruning, where weights with small L1-norm magnitudes are removed according to a predefined pruning ratio.

4 Experimental results

This section first introduces the datasets, evaluation metrics, and experimental setup. Then, the classification performance of the proposed method is compared with that of five representative ECG classification methods. Finally, visualization analyses are provided to interpret the hierarchical feature representation behavior of the proposed model, and ablation studies are conducted to investigate the effects of the key designs in HPRNet, including HRB, REBs, and MLPO.

4.1 ECG signal datasets

We used the MIT-BIH Moody and Mark (2001) and INCART Goldberger et al. (2000) datasets to train the network and comprehensively evaluate the classification performance.

MIT-BIH: The MIT-BIH Arrhythmia Database is a widely used public ECG dataset for arrhythmia analysis. It contains 48 half-hour two-lead ECG recordings collected from 47 subjects, comprising more than 110,000 annotated heartbeats labeled by cardiologists. The ECG signals are sampled at 360 Hz with 11-bit resolution and include 15 types of arrhythmia annotations.

INCART: The INCART Arrhythmia Database contains 12-lead ECG recordings with various types of arrhythmias. The dataset provides more diverse and complex ECG patterns, making it suitable for evaluating the generalization ability of classification models. The ECG signals are sampled at 257 Hz.

Figure 3 illustrates the category distributions of the MIT-BIH and INCART datasets. It can be observed that the INCART dataset exhibits a more imbalanced distribution compared with MIT-BIH. In addition, Figures 4, 5 present representative ECG signal samples from the MIT-BIH and INCART datasets, respectively. For both datasets, only the MLII lead was used in this study.

Figure 3

Figure 4

Figure 5

4.2 Evaluation indicators

To comprehensively evaluate the performance of different methods, Accuracy, Precision, Recall, and F1-score were used to measure the classification performance for each class Opitz (2024). In addition, confusion matrices were provided to analyze the class-wise prediction behavior. These metrics are defined as Equations 811:

where TP denotes true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives.

Furthermore, the number of parameters (Params), average inference time (Time), and million floatingpoint operations (MFLOPs) were used to evaluate the model complexity and computational cost. Specifically, Params reflects the overall model size, Time denotes the average inference latency per ECG segment under identical experimental settings, and MFLOPs quantify the number of floating-point operations required for a single inference.

4.3 Experimental setup

To comprehensively evaluate the model performance, multiple classification tasks were designed on the MIT-BIH and INCART datasets. For the MIT-BIH dataset, a four-class classification task was constructed according to the AAMI standard, as shown in Table 2. For the INCART dataset, both normal-abnormal binary classification and AAMI-like three-class classification tasks were designed to analyze the model performance under different task difficulties. During training, we used five-fold cross-validation to evaluate the classification performance.

Table 2

AAMI (-like)Type descriptionMIT-BIH symbolsNumber of beatsINCART symbolsNumber of beats
NNormal beatN,L,R,e,j90,631N,R,j,n,B153,709
SSupraventricular ectopic beatA,a,J,S2,781A,S1,960
VVentricular ectopic beatV, E7,236V200,232
FFusion beatF803N/AN/A
QUnknown beat/,f, Q8,043Q6
TotalN,L,R,e,j,A,a,J,S,V, E, F,/,f,Q109,494N,L,R,j,n,B,A,S,V,F,Q175,907

AAMI standard on MIT-BIH and AAMI-like standard on INCART, as well as the corresponding number of heart beats.

All experiments were conducted on a workstation equipped with an Intel Core i9-14900HX CPU, 32 GB RAM, and an NVIDIA RTX 5070 Ti GPU. The proposed HPRNet model was implemented using Python 3.10 and the PyTorch deep learning framework. During training, the Adam optimizer was adopted with an initial learning rate of 0.001 and a batch size of 128. The maximum number of training epochs was set to 30, and the learning rate was adaptively adjusted during training.

4.4 Classification results

4.4.1 Results on MIT-BIH

To comprehensively evaluate the proposed HPRNet, we compare it with several representative ECG heartbeat classification methods, including Ensemble_SVM Mondéjar-Guerra et al. (2019), SE-ECGNet Chen et al. (2020), ECGTransForm Alamr and Artoli (2023), SRT Wu et al. (2024), and LightweightCNN Thota et al. (2025). For a fair comparison, all baseline models were implemented following the original hyperparameter settings reported in their respective publications. The detailed experimental configurations are summarized in Table 3.

Table 3

MethodsFrameworkOptimizerEpochsBatch sizeLearning_rate (initial)
Ensemble_SVM Mondéjar-Guerra et al. (2019)Scikit-learnAdam601281e-3
SE-ECGNet Chen et al. (2020)PyTorchAdam256641e-3
ECGTransForm Alamr and Artoli (2023)PyTorchAdam601281e-3
SRT Wu et al. (2024)N/AN/AN/AN/AN/A
LightweightCNN Thota et al. (2025)KerasAdam30321e-3
oursPyTorchAdam301281e-3

The hyperparameter settings of different methods.

Specifically, five-fold cross-validation was conducted on the MIT-BIH dataset to evaluate the proposed HPRNet model. As shown in Table 4, HPRNet achieved an average accuracy of 98.97%, reflecting its strong overall classification capability. Meanwhile, the average precision, recall, and F1-score reached 94.63%, 89.89%, and 92.05%, respectively, demonstrating its balanced performance in terms of sensitivity and overall classification effectiveness. For class-wise results, F1N and F1V reached 99.53% and 97.58%, respectively, indicating that the model can effectively identify N and V heartbeats. By comparison, the lower values of F1S and F1F suggest that the classification of S and F heartbeats is more challenging. This phenomenon may be attributed to class imbalance, as the numbers of S and F samples are relatively small, leading to insufficient learning of minority-class features. In addition, these two types of heartbeats share certain morphological similarities with other categories, which further increases the difficulty of classification. This observation is also reflected in the confusion matrices shown in Figure 6, where most samples are concentrated along the main diagonal across all five folds, while misclassifications are more likely to occur in the S and F categories. Nevertheless, the small performance fluctuations across the five folds demonstrate that the proposed HPRNet has good classification performance, robustness, and stability.

Table 4

Fold IDAccuracyPrecisionRecallF1NF1SF1VF1FF1avg
199.0493.1191.7099.6490.4398.4281.0192.38
298.9493.4191.3899.5990.6197.7781.4892.36
398.8995.9688.0899.4584.5296.6186.0891.67
498.8895.5086.8499.4684.1596.9482.0590.65
599.1295.1691.4399.5387.6498.1487.5093.20
Average98.9794.6389.8999.5387.4797.5883.6292.05

The results of HPRNet using five-fold cross-validation on MIT-BIH dataset.

Figure 6

Table 5 further compares the proposed HPRNet with several representative ECG heartbeat classification methods on the MIT-BIH dataset in terms of both classification performance and computational efficiency. As shown in the table, HPRNet achieves the best overall classification performance, with an accuracy of 98.97%, precision of 94.63%, recall of 89.89%, and an F1-score of 92.05%. In terms of efficiency, although HPRNet does not have the smallest parameter count or MFLOPs, it achieved the lowest inference latency (0.0312 s). This suggests that the proposed model provides a favorable balance between classification accuracy and practical inference efficiency.

Table 5

ModelsClassification performance ↑Efficiency ↓
Accuracy (%)Precision (%)Recall (%)F1-score (%)Params (#)Latency (s)MFLOPs
Ensemble_SVM Mondéjar-Guerra et al. (2019)94.5066.4070.3068.298,253,3600.03621.774
SE-ECGNet Chen et al. (2020)91.6783.3360.6070.1816,496,0351.35041483.304
ECGTransFormAlamr and Artoli (2023)96.3488.2989.5388.84637,4310.08007.710
SRT Wu et al. (2024)95.7078.6088.1082.60
LightweightCNN Thota et al. (2025)90.7279.0077.0077.99940,0690.1641193.917
ours98.9794.6389.8992.0519,110,6410.0312713.142

The classification and efficiency comparisons of different methods on MIT-BIH dataset.

↑ denotes the higher and the better, and ↓ denotes the lower the better.

The bold values indicate the best results among the compared methods.

4.4.2 Results on INCART

The experimental results of different classification tasks on the INCART dataset are summarized in Table 6, and the corresponding confusion matrices are illustrated in Figure 7. For the normal-abnormal binary classification task, the proposed model achieves an accuracy of 98.41%, with precision, recall, and F1-score of 94.51%, 89.59%, and 91.98%, respectively. As shown in Figure 7A, the model correctly identifies 99.50% of normal beats, while 79.68% of abnormal beats are accurately detected, demonstrating a good capability in distinguishing normal and abnormal heartbeats.

Table 6

Task typesAccuracy (%)Precision (%)Recall (%)F1-score (%)
Binary-classification98.4194.5189.5991.98
AAMI three-classification98.3284.1574.6079.09

The results of different classification tasks on INCART dataset.

Figure 7

For the AAMI-like three-class classification task, the proposed method also achieves competitive performance, with an overall accuracy of 98.32%. As shown in Figure 7B, the model performs very well in identifying normal beats (98.98%) and ventricular ectopic beats (90.51%). In contrast, the recognition of S remains challenging, with a recall of only 34.33%. This limitation is mainly attributed to the severe class imbalance in the INCART dataset, where S beats are significantly underrepresented, as well as their morphological similarity to normal beats. Consequently, a large proportion of S beats are misclassified as normal beats, which degrades the overall performance in this category. Nevertheless, the proposed model still demonstrates discriminative ability and stable performance across different classification tasks on the INCART dataset.

4.5 Visualization analysis

To further investigate the hierarchical representation behavior of HPRNet, Gradient-weighted Class Activation Mapping (Grad-CAM) was employed to visualize the feature responses of different backbone layers.

4.5.1 Hierarchical representation analysis

Figure 8 presents the Grad-CAM responses of representative normal (N) and ventricular ectopic (V) beats at different layers of the proposed HPRNet. In the shallow layer, the network mainly focuses on prominent local waveform structures, particularly the QRS complex. As the depth increases, the receptive field gradually expands, and the model begins to capture broader morphological characteristics, including waveform regions related to the P wave and T wave. In deeper layers, the network learns more global heartbeat representations, highlighting discriminative regions associated with ventricular ectopic beats, such as abnormal ST segments and waveform distortions. These observations indicate that the proposed hierarchical pyramidal backbone effectively learns multi-level ECG representations, enabling the model to progressively extract discriminative features for arrhythmia classification.

Figure 8

4.5.2 Analysis of challenging S and F categories

Figure 9 presents the Grad-CAM responses of different backbone layers for representative misclassified heartbeat samples. For the S→N case, the shallow layer mainly focuses on the QRS complex, while deeper layers gradually incorporate broader waveform context. However, due to the morphological similarity between S and N, the discriminative features remain insufficiently distinctive, resulting in misclassification. For the F→N and F→V cases, the activation maps show that the model progressively attends to abnormal waveform regions such as the ST segment and T wave. Nevertheless, the hybrid morphology of fusion beats shares characteristics with both normal and ventricular beats, thereby increasing the classification ambiguity. These observations indicate that the proposed hierarchical backbone captures multi-scale ECG representations, while also revealing the intrinsic difficulty in distinguishing S and F heartbeat categories.

Figure 9

4.6 Ablation studies

4.6.1 Discussion on noise robustness

The proposed HPRNet adopts a hierarchical pyramidal residual architecture that facilitates stable feature learning while preserving essential ECG signal characteristics. Owing to this structural advantage, explicit signal denoising is not required as a mandatory preprocessing step. To further investigate this aspect, an additional experiment was conducted to evaluate the impact of ECG denoising on classification performance. Specifically, a representative wavelet-based denoising method was applied using the sym8 wavelet with five decomposition levels and soft-thresholding to suppress high-frequency noise Ádám et al. (2025). The corresponding classification results are summarized in Table 7. As shown in Table 7, only marginal differences are observed between the denoised and raw ECG inputs across all evaluation metrics, and the non-denoised setting even achieves slightly better performance. These results indicate that the proposed HPRNet can inherently learn noise-robust representations from raw ECG signals, thereby maintaining stable classification performance without additional denoising preprocessing.

Table 7

Denoising preprocessingAccuracy (%)Precision (%)Recall (%)F1 score (%)
98.8692.5590.1791.15
×98.9794.6389.8992.05

Comparisons of classification results with and without denoising preprocessing on MIT-BIH dataset.

4.6.2 Discussion on the effect of HRB

To further evaluate the effectiveness of the proposed HRB in feature extraction, ablation experiments were conducted by replacing the HRB with standard convolutional layers and removing the residual skip-connections in REBs, respectively. As shown in Table 8, the standard CNN achieves an accuracy of only 74.79%, indicating that simple convolutional stacking is insufficient to capture discriminative ECG features. Removing residual skip-connections improves the accuracy to 94.25%, highlighting the importance of residual feature propagation for deep representation learning. This is consistent with previous findings on residual networks Ranganathan et al. (2025). By further incorporating the proposed residual extraction blocks (REBs) to construct the HRB backbone, the model further achieves 98.97% accuracy without increasing the number of parameters, demonstrating the effectiveness of REBs in enhancing feature representation and stabilizing deep network training.

Table 8

Backbone configurationAccuracy (%)Params (#)
std. CNN74.7919,005,998
w/o skip-connections94.2519,110,641
HRB (ours)98.9719,110,641

Impact of the HRB on the HPRNet Backbone.

The bold values indicate the best results among the compared configurations.

4.6.3 Impact of the number of REBs

In this work, residual extraction blocks (REBs) are adopted as the basic building units of the HRB backbone to facilitate the extraction of deeper and more discriminative signal features. However, increasing the number of REBs may lead to a substantial growth in model parameters. Therefore, it is necessary to balance classification performance with model complexity. To investigate the impact of the number of REBs, a series of experiments were conducted using different REB configurations. Table 9 reports the classification accuracy and parameter numbers of HPRNet with varying numbers of REBs in each ResLayer. As shown in Table 9, the configuration [3,4,4,4,4,4,4,4,3] achieves the highest classification accuracy while maintaining a moderate number of parameters, demonstrating a favorable trade-off between model performance and computational complexity.

Table 9

REBs number for each ResLayerAccuracy (%)Params (#)
[2,3,3,3,3,3,3,3,2]98.6313,166,219
[3,4,4,4,4,4,4,4,3]98.9719,110,641
[4,5,5,5,5,5,5,5,4]98.8425,055,063
[5,6,6,6,6,6,6,6,5]98.6730,999,485
[6,7,7,7,7,7,7,7,6]98.6436,943,907

Trade-off between classification accuracy and parameter number of different REB configurations.

The bold values indicate the best results among the compared configurations.

4.6.4 Impact of the MLPO strategy

In this work, a multi-level pruning optimization (MLPO) strategy is adopted to compress HPRNet by jointly applying network-level pruning to both the first convolution and full-connected layers and block-level pruning to the REBs. To evaluate the effectiveness of different pruning strategies, ablation experiments were conducted by comparing models without pruning, with network-level pruning, with block-level pruning, and with the complete MLPO strategy. As shown in Table 10, both block-level and network-level pruning effectively reduce the number of model parameters while maintaining comparable classification performance, and their effects are complementary. The complete MLPO strategy achieves the best performance with the lowest model complexity, demonstrating that multi-level pruning can effectively improve computational efficiency while preserving robust arrhythmia classification capability.

Table 10

Pruning configurationsAccuracy (%)Precision (%)Recall (%)F1-score (%)Params (#)Time (s)MFLOPs
w/o pruning98.7893.6985.4489.38190,738,8560.05457142.88
network-level (Conv+FC)98.9694.6588.7991.63190,730,9810.05117130.43
block-level (REBs)98.1292.6080.3186.0219,118,5040.0346714.15
MLPO (ours)98.9794.6389.8992.0519,110,6410.0312713.14

Performance comparisons of different pruning configurations for HPRNet.

The bold values indicate the best results among the compared pruning configurations.

In addition, an appropriate pruning ratio can improve model compactness while preserving classification performance. Therefore, additional experiments were conducted to analyze the impact of different pruning ratios on model accuracy and parameter size. As shown in Table 11, the classification accuracy remains relatively stable as the pruning ratio increases, while the number of parameters decreases significantly. This phenomenon can be attributed to the removal of redundant low-magnitude weights and the presence of residual connections, which facilitate effective feature reuse and stable information propagation. Among the evaluated configurations, a pruning ratio of 0.9 achieves the best trade-off between classification accuracy and model compactness for HPRNet.

Table 11

Pruning ratior = 0.1r = 0.2r = 0.3r = 0.4r = 0.5r = 0.6r = 0.7r = 0.8r = 0.9
Accuracy (%)98.7898.9698.8898.9098.8298.8698.8498.8698.97
Params (#)171,669,055152,599,254133,529,450114,459,64995,389,84876,320,04757,250,24638,180,44219,110,641

Analysis of different pruning ratio (r) of L1-norm-based unstructured pruning strategy on the model’s classification accuracy and parameter number.

To further illustrate the effect of MLPO, Figures 10A, B present the weight distributions of HPRNet before and after pruning with a ratio of 0.9. Before pruning, a large proportion of weights have small absolute values and contribute marginally to classification performance. After pruning, the weight distribution becomes significantly sparser, indicating that many redundant weights have been removed. These results further confirm that the proposed MLPO strategy effectively produces a more compact model while preserving critical weight structures for accurate arrhythmia classification.

Figure 10

4.7 Limitations

While the proposed HPRNet framework demonstrates encouraging classification performance, several limitations remain. First, this study does not explicitly incorporate specialized mechanisms designed for morphologically similar ECG categories, whose subtle differences may increase classification difficulty. Future work will explore more discriminative feature learning or targeted optimization strategies, such as attention mechanisms and contrastive representation learning Ma et al. (2026) to improve recognition of similar categories. Second, heartbeat categories in ECG datasets are often highly imbalanced Zhao et al. (2025, 2026). Future work will explore strategies such as data augmentation, class-balanced training, or cost-sensitive learning to further improve the recognition of underrepresented categories. In addition, this study primarily emphasizes empirical evaluation of the proposed deep learning architecture. A deeper theoretical investigation of the model’s convergence behavior and generalization properties remains an interesting direction for future work.

5 Conclusion

This paper proposed HPRNet, a hierarchical pyramidal residual network for multi-class ECG arrhythmia classification. The proposed architecture progressively learns hierarchical ECG representations through stacked residual layers composed of Residual Extraction Blocks (REBs), enabling effective modeling of both waveform morphology and long-term rhythm characteristics. To alleviate parameter redundancy and computational overhead introduced by deep residual structures, a Multi-Level Pruning Optimization (MLPO) strategy was incorporated at both the network and block levels, which effectively compresses the model while preserving its discriminative capability. Experimental results on the MIT-BIH and INCART datasets demonstrate that HPRNet achieves competitive classification performance with favorable computational efficiency. Visualization analyses further reveal that the network progressively captures ECG features from local waveform morphology to broader temporal context. Meanwhile, the results also highlight the intrinsic difficulty of distinguishing challenging heartbeat categories, such as S and F. Additional studies on noise robustness and architectural components further validate the effectiveness and stability of the proposed framework. In particular, the HRB and the MLPO strategy contribute to improved feature representation and computational efficiency. Future work will investigate class-balanced learning strategies and inter-patient evaluation protocols to further improve the robustness and generalization capability of the proposed model.

Statements

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: The datasets analyzed in this study are publicly available from PhysioNet. MIT-BIH Arrhythmia Database: https://physionet.org/content/mitdb/ INCART Database: https://physionet.org/content/incartdb/.

Author contributions

JH: Conceptualization, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing. MH: Data curation, Funding acquisition, Investigation, Writing – original draft. HZ: Validation, Visualization, Writing – original draft. YX: Funding acquisition, Investigation, Validation, Writing – review & editing. YB: Investigation, Writing – review & editing. AG: Project administration, Supervision, Writing – review & editing. SL: Funding acquisition, Supervision, Writing – review & editing, Project administration.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the Putian Science and Technology Plan Project (Nos. 2024GZ2001PTXY18 and 2024SZ3001PTXY14), the Startup Fund for Advanced Talents of Putian University (No. 2025028), the Putian University Horizontal Project (No. 2023AHX036(L)), and the China Scholarship Council (CSC) (No. 202508350016).

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. Generative AI tools were used solely for language editing and grammar refinement to improve the clarity and readability of the manuscript. The authors take full responsibility for the content, accuracy, and integrity of the work, and confirm that no data, results, figures, or scientific conclusions were generated by artificial intelligence.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AbdullaL. A.Al-AniM. S. (2020). A review study for electrocardiogram signal classification. UHD J. Sci. Technol.4, 103117. doi: 10.21928/uhdjst.v4n1y2020.pp103-117

  • 2

    ÁdámN.Val’koD.BaloghZ.MadošB.HurtukJ. (2025). Comparative evaluation of filtration techniques for ecg signal denoising with emphasis on stationary wavelet transform. Sci. Rep.15, 42514. doi: 10.1038/s41598-025-26476-1

  • 3

    AdilakshmiM. (2024). A machine learning-based optimized framework for detection of arrhythmia from ecg data. J. Theor. Appl. Inf. Technol.102 (13), 52805305.

  • 4

    AlamrA.ArtoliA. (2023). Unsupervised transformer-based anomaly detection in ecg signals. Algorithms16, 152. doi: 10.3390/a16030152. PMID:

  • 5

    AnsariY.MouradO.QaraqeK.SerpedinE. (2023). Deep learning for ecg arrhythmia detection and classification: an overview of progress for period 2017–2023. Front. Physiol.14, 1246746. doi: 10.3389/fphys.2023.1246746. PMID:

  • 6

    AshkboosS.CrociM. L.NascimentoM. G.HoeflerT.HensmanJ. (2024). Slicegpt: Compress large language models by deleting rows and columns. arXiv preprint arXiv:2401.15024.

  • 7

    BerkayaS. K.UysalA. K.GunalE. S.ErginS.GunalS.GulmezogluM. B. (2018). A survey on ecg analysis. Biomed. Signal Process. Control43, 216235. doi: 10.1016/j.bspc.2018.03.003. PMID:

  • 8

    ChenJ.ChenT.XiaoB.BiX.WangY.DuanH.et al. (2020). Se-ecgnet: multi-scale se-net for multi-lead ecg data 2020 computing in cardiology (Rimini, Italy: IEEE), 14. Available online at: https://ieeexplore.ieee.org/abstract/document/9344162.

  • 9

    ChoiH.ParkJ.LeeJ.SimD. (2024). Review on spiking neural network-based ecg classification methods for low-power environments. Biomed. Eng. Lett.14, 917941. doi: 10.1007/s13534-024-00391-2. PMID:

  • 10

    GoldbergerA. L.AmaralL. A.GlassL.HausdorffJ. M.IvanovP. C.MarkR. G.et al. (2000). Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation101, e215e220. doi: 10.1016/j.jelectrocard.2003.09.038. PMID:

  • 11

    HanS.PoolJ.TranJ.DallyW. (2015). Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst.28, 11351143. doi: 10.5555/2969239.2969366

  • 12

    HannunA. Y.RajpurkarP.HaghpanahiM.TisonG. H.BournC.TurakhiaM. P.et al. (2019). Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med.25, 6569. doi: 10.1038/s41591-018-0268-3. PMID:

  • 13

    JinY.LiZ.WangM.LiuJ.TianY.LiuY.et al. (2024). Cardiologist-level interpretable knowledge-fused deep neural network for automatic arrhythmia diagnosis. Commun. Med.4, 31. doi: 10.1038/s43856-024-00464-4. PMID:

  • 14

    KhanF.YuX.YuanZ.RehmanA. U. (2023). Ecg classification using 1-d convolutional deep residual neural network. PloS One18, e0284791. doi: 10.1371/journal.pone.0284791. PMID:

  • 15

    KropfM.HaynD.SchreierG. (2017). Ecg classification based on time and frequency domain features using random forests 2017 computing in cardiology (CinC) (Rennes, France: IEEE), 14. Available online at: https://ieeexplore.ieee.org/abstract/document/8331577.

  • 16

    LiZ.TianY.JinY.WeiX.WangM.LiuJ.et al. (2025a). An early warning method for arrhythmias in long-term ecgs based on self-supervised learning and lstm. Knowledge-Based Syst.327, 114137. doi: 10.1016/j.knosys.2025.114137. PMID:

  • 17

    LiZ.TianY.JinY.WeiX.WangM.LiuJ.et al. (2025b). Eddm: A novel ecg denoising method using dual-path diffusion model. IEEE Trans. Instrum. Meas.74, 115 doi: 10.1109/TIM.2025.3542875. PMID:

  • 18

    LiuC.-L.XiaoB.TsaiC.-F. (2025). Ecg-star: Spatio-temporal attention residual networks for multi-label ecg abnormality classification. Inf. Sci.717, 122273. doi: 10.1016/j.ins.2025.122273. PMID:

  • 19

    MaX.FangG.WangX. (2023). Llm-pruner: On the structural pruning of large language models. Adv. Neural Inf. Process. Syst.36, 2170221720. doi: 10.52202/075280-0950

  • 20

    MaK.ZhangT.ZhangH.HuangW. (2026). Self-supervised contrastive learning achieves 12-lead ecg classification. Biomed. Signal Process. Control112, 108420. doi: 10.1016/j.bspc.2025.108420. PMID:

  • 21

    MarzogH. A.AbdH. J. (2022). Ecg-signal classification using efficient machine learning approach 2022 international congress on human-computer interaction, optimization and robotic applications (HORA) (Ankara, Turkey: IEEE), 17. Available online at: https://ieeexplore.ieee.org/abstract/document/9800092.

  • 22

    Mondéjar-GuerraV.NovoJ.RoucoJ.PenedoM. G.OrtegaM. (2019). Heartbeat classification fusing temporal and morphological information of ECGs via ensemble of classifiers. Biomed. Signal Process. Control47, 4148. doi: 10.1016/j.bspc.2018.08.007. PMID:

  • 23

    MoodyG. B.MarkR. G. (2001). The impact of the mit-bih arrhythmia database. IEEE Eng. Med. Biol. Mag.20, 4550. doi: 10.1109/51.932724. PMID:

  • 24

    OpitzJ. (2024). A closer look at classification evaluation metrics and a critical reflection of common evaluation practice. Trans. Assoc. For. Comput. Linguistics12, 820836. doi: 10.1162/tacl_a_00675

  • 25

    Özal Yildirim (2018). A novel wavelet sequence based on deep bidirectional lstm network model for ecg signal classification. Comput. Biol. Med.96, 189202. doi: 10.1016/j.compbiomed.2018.03.016. PMID:

  • 26

    PengH.ChangX.YaoZ.ShiD.ChenY. (2024). A deep learning framework for ecg denoising and classification. Biomed. Signal Process. Control94, 106441. doi: 10.1016/j.bspc.2024.106441. PMID:

  • 27

    QiT.ZhangH.ZhaoH.ShenC.LiuX. (2024). Research on ecg signal classification based on hybrid residual network. Appl. Sci.14, 11202. doi: 10.3390/app142311202. PMID:

  • 28

    RanganathanV. A.ShaileshT.NA.TP. P. (2025). Deep learning based hybrid residual attention and echo state network for high-accuracy heart disease prediction. F1000Research14, 650. doi: 10.12688/f1000research.165575.2

  • 29

    ThotaV.PrajapatiH.JoshiY.RathiS. (2025). A lightweight cnn-attention-bilstm architecture for multi-class arrhythmia classification on standard and wearable ecgs. arXiv preprint arXiv:2511.08650. doi: 10.48550/arXiv.2511.08650

  • 30

    WangX.ChenB.ZengM.WangY.LiuH.LiuR.et al. (2022b). An ecg signal denoising method using conditional generative adversarial net. IEEE J. Biomed. Health Inf.26, 29292940. doi: 10.1109/jbhi.2022.3169325. PMID:

  • 31

    WangL.-H.ZhangZ.-H.TsaiW.-P.HuangP.-C.AbuP. A. R. (2022a). Low-power multi-lead wearable ecg system with sensor data compression. IEEE Sens. J.22, 1804518055. doi: 10.1109/jsen.2022.3195501. PMID:

  • 32

    WuW.HuangY.WuX. (2024). Srt: Improved transformer-based model for classification of 2d heartbeat images. Biomed. Signal Process. Control88, 105017. doi: 10.1016/j.bspc.2023.105017. PMID:

  • 33

    ZhangF.LiM.SongL.WuL.WangB. (2023). Multi-classification method of arrhythmia based on multi-scale residual neural network and multi-channel data fusion. Front. Physiol.14, 1253907. doi: 10.3389/fphys.2023.1253907. PMID:

  • 34

    ZhangH.LiuW.ShiJ.ChangS.WangH.HeJ.et al. (2022). Maefe: Masked autoencoders family of electrocardiogram for self-supervised pretraining and transfer learning. IEEE Trans. Instrum. Meas.72, 115. doi: 10.1109/tim.2022.3228267. PMID:

  • 35

    ZhaoY.FengJ.HeX.HuX.LiH.UllahH.et al. (2026). Awbifs: an incremental fusion system for arrhythmia recognition on imbalanced ecg data with adaptive weighting. Expert Syst. Appl.311, 131374. doi: 10.1016/j.eswa.2026.131374. PMID:

  • 36

    ZhaoC.LaiB.XuY.WangY.DongH. (2025). Mak-net: A multi-scale attentive kolmogorov– arnold network with bigru for imbalanced ecg arrhythmia classification. Sensors25, 3928. doi: 10.3390/s25133928. PMID:

  • 37

    ZhuH.ChengC.YinH.LiX.ZuoP.DingJ.et al. (2020). Automatic multilabel electrocardiogram diagnosis of heart rhythm or conduction abnormalities with deep learning: a cohort study. Lancet Digital Health2, e348e357. doi: 10.1016/s2589-7500(20)30107-2. PMID:

Summary

Keywords

deep learning, ECG arrhythmia classification, hierarchical pyramidal residual network, model pruning optimization, multi-scale feature learning

Citation

Huang J, Huang M, Zheng H, Xiao Y, Bolea Y, Grau A and Luo S (2026) HPRNet: a hierarchical pyramidal residual network for ECG arrhythmia classification. Front. Physiol. 17:1800941. doi: 10.3389/fphys.2026.1800941

Received

31 January 2026

Revised

13 April 2026

Accepted

14 April 2026

Published

08 May 2026

Volume

17 - 2026

Edited by

Feng Chen, Dallas County, United States

Reviewed by

Prof. Akhilesh A. Waoo, AKS University, India

Kai Jiang, The University of Texas at Dallas, United States

Xiaohui Chen, Baylor University, United States

Updates

Copyright

*Correspondence: Shaoye Luo, ;

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics