A lightweight fetal distress-assisted diagnosis model based on a cross-channel interactive attention mechanism

Fetal distress is a symptom of fetal intrauterine hypoxia, which is seriously harmful to both the fetus and the pregnant woman. The current primary clinical tool for the assessment of fetal distress is Cardiotocography (CTG). Due to subjective variability, physicians often interpret CTG results inconsistently, hence the need to develop an auxiliary diagnostic system for fetal distress. Although the deep learning-based fetal distress-assisted diagnosis model has a high classification accuracy, the model not only has a large number of parameters but also requires a large number of computational resources, which is difficult to deploy to practical end-use scenarios. Therefore, this paper proposes a lightweight fetal distress-assisted diagnosis network, LW-FHRNet, based on a cross-channel interactive attention mechanism. The wavelet packet decomposition technique is used to convert the one-dimensional fetal heart rate (FHR) signal into a two-dimensional wavelet packet coefficient matrix map as the network input layer to fully obtain the feature information of the FHR signal. With ShuffleNet-v2 as the core, a local cross-channel interactive attention mechanism is introduced to enhance the model’s ability to extract features and achieve effective fusion of multichannel features without dimensionality reduction. In this paper, the publicly available database CTU-UHB is used for the network performance evaluation. LW-FHRNet achieves 95.24% accuracy, which meets or exceeds the classification results of deep learning-based models. Additionally, the number of model parameters is reduced many times compared with the deep learning model, and the size of the model parameters is only 0.33 M. The results show that the lightweight model proposed in this paper can effectively aid in fetal distress diagnosis.


Introduction
Fetal distress is a syndrome of respiratory and circulatory insufficiency caused by intrauterine fetal hypoxia during labor and is closely associated with changes in fetal heart rate signals (Blickstein and Green, 2007;Spairani et al., 2022). Fetal distress may cause hypoxic-ischemic encephalopathy and eventually leading to cerebral palsy or perinatal death (Bobrow and Soothill, 1999). Early detection and diagnosis of fetal distress can help prevent damage to the vital organs of the fetus prior to delivery. Therefore, it is important to enhance OPEN ACCESS EDITED BY intrauterine fetal status monitoring during pregnancy to ensure the safety of the fetus and the pregnant woman. The most common method for monitoring fetal status in clinical practice is CTG monitoring (Grivell et al., 2015). The CTG signal consists of the FHR curve and uterine contraction (UC) curve. Through CTG monitoring, doctors can detect fetal distress in time so that they can take effective treatment measures to protect the health of the fetus. However, the diagnosis is too dependent on physician experience and interobserver disagreement when interpreted by the physician's naked eye alone (Bernardes et al., 1997;Palomaki et al., 2006). Therefore, there is an increased incidence of unnecessary cesarean section due to subjective physician error (Abdulhay et al., 2014;Marques et al., 2019).
With the development of artificial intelligence technology, scholars worldwide are committed to developing fetal healthassisted diagnosis systems based on machine learning and deep learning to help healthcare professionals analyze CTG signals objectively and correctly. Barquero-Perez et al. (2017); Spilka et al. (2014); Georgoulas et al. (2017) ;Yilmaz. (2016) used normalized compression distance, random forest (RF), support vector machine (SVM), and artificial neural network (ANN) classification algorithms, respectively, to classify CTG signals for fetal distress problems and achieved good results. Zhao et al. (2018) extracted 47 features from different domains (morphological, time domain, frequency domain and non-linear domain) and selected Decision Tree, SVM and adaptive boosting, respectively, for fetal acidosis classification.  used short-time Fourier transform (STFT) to obtain 2-D images and combined it with transfer learning and convolutional neural networks to predict fetal distress (Liu et al., 2021). proposed an attention-based CNN-BiLSTM hybrid neural network enhanced with features of discrete wavelet transformation, obtaining an average sensitivity, specificity and quality index of 75.23%, 70.82%, and 72.93%, respectively. Zhao et al. (2019) used recurrence plot to convert one-dimensional FHR to two-dimensional and fed into convolutional neural network to obtain 98.69% accuracy in fetal distress classification. Baghel et al. (2022) obtained 99.09% classification accuracy by performing direct 1-D convolutional operations on the FHR signal after Butterworth filtering. Although the abovementioned classification models based on machine learning and deep learning achieve better results, the complexity of the model and the large number of parameters take up large computational resources, which leads to the model being highly dependent on the performance of the device hardware and difficult to deploy to the terminal for generalized application.
Lightweight models and miniaturization have become a trend in many application scenarios, so an increasing number of academics are focusing on lightweight network models that can be deployed and run directly on mobile devices. The MobileNet series (Howard et al., 2017;Sandler et al., 2018;Howard et al., 2019) and ShuffleNet series (Ma et al., 2018;Zhang et al., 2018) of lightweight networks currently have good performance in the target detection and image classification field. MobileNet model is a lightweight deep neural network proposed by Google for embedded devices, using the core idea of depthwise separable convolution. ShuffleNet model is a neural network structure designed for devices with limited computational resources, mainly using pointwise group convolution and channel shuffle. Lightweight models are also beginning to make their mark in the medical signaling field. Cao et al. (2021) proposed a multichannel lightweight model with each channel integrating multiple heterogeneous convolutional layers to obtain multilevel features for classifying myocardial infarction with an accuracy rate of 96.65%. Zheng et al. (2021) trained MobileNetV1 and MobileNetV2 models by migration learning for pterygium diagnosis in the eye and compared them with the classical model and found that MobileNetV2 obtained better results with a model size of only 13.5 M. Chen et al. (2022) used the lightweight networks MobileNetV1, MobileNetV2, and Xception to classify cervical cancer cells and used knowledge distillation for accuracy improvement. Among them, Xception matched the accuracy of the large network Inception-ResNetV2, while the model size was only 40%. The lightweight network model effectively reduces the number of model parameters and opens up a method for promoting a low-cost operating model. However, the feature extraction ability and the network classification accuracy still need to be further improved.
Aiming at the complexity and considerable computation in existing deep learning-based fetal distress algorithm models, this paper introduces a lightweight network architecture to design a lightweight fetal distress-assisted diagnosis network based on FHR. Additionally, to further improve the feature extraction ability and classification effect of the network, the attention mechanism is incorporated into the lightweight network to build a lightweight network unit (ECA-Shuffle) based on the cross-channel interactive attention mechanism. The main contributions of this paper are as follows.
(1) The matrix feature map based on wavelet packet coefficients is constructed to refine the FHR signal in multiple frequency bands and used as input to the model. Different wavelet basis functions are selected to generate multiple feature maps to vote on the sample classification results.
(2) The cross-channel interactive attention module is embedded in the tail of the ShuffleNet-V2 base unit to generate an ECA-Shuffle unit to achieve effective multichannel feature fusion without dimensionality reduction. (3) A lightweight fetal distress-assisted diagnosis network based on the FHR signal, LW-FHRNet, is proposed. Conventional convolution with ECA-Shuffle units ensures effective channel feature fusion while reducing model complexity and enhances the model's ability to classify fetal distress.
The rest of the paper is presented below. Section 2 describes the overall scheme in detail. Section 3 describes the database, experimental setup and results in detail. Section 4 discusses and analyzes the performance of the proposed model. The final section contains conclusions and future work.

Materials and methods
The architecture of the lightweight fetal distress-assisted diagnosis model based on the cross-channel interactive attention mechanism designed in this paper is shown in Figure 1, including a preprocessing module, a feature map construction module, and a feature extraction and classification module. First, the missing values Frontiers in Physiology frontiersin.org 02 and spikes in the FHR signal are removed by signal preprocessing, and the signal is segmented into 20-min lengths. Second, the wavelet packet decomposition technique is used to construct wavelet coefficient matrix feature maps of FHR signals based on db1 to db5 wavelet basis functions. Finally, LW-FHRNet is constructed by using deep separable convolution, channel shuffle and other techniques and incorporating a local cross-channel interactive attention mechanism without dimensionality reduction, which effectively reduces the number of model parameters and improves the classification accuracy of the model.

Signal preprocessing
Clinically, the FHR signal is acquired mainly by an ultrasound Doppler probe placed in the abdomen of the pregnant woman. During the acquisition process, the signal is inevitably subject to a variety of noise interferences, such as the movement of the fetus and the pregnant woman, improper placement of the sensor and other external factors. The noise of the FHR is represented by spikes (FHR values greater than 200 or less than 50 bpm) and missing values (FHR values equal to 0) (Cesarelli et al., 2007). Accordingly, the purpose of preprocessing is to remove these two types of noise. In this study, the interpolation method is used to remove noise (Chudaek et al., 2009), and the specific process is as follows.
(1) If the FHR value is equal to 0 and the duration is greater than 15 s, the segment is removed directly; otherwise, it is linearly interpolated.
(2) If the FHR value is unstable, i.e., the absolute value of two adjacent points is greater than 25 bpm, and interpolation is performed between the starting sampling point and the first point of the next stabilization segment. A stable segment is defined as five consecutive FHR values where the difference is less than 10 bpm.
(3) If the FHR value is greater than 200 bpm or less than 50 bpm, it is filled in with Hermite spline interpolation.
Noise and missing value segments in the FHR signal can be effectively filtered out by the above interpolation method. In conjunction with the time requirement of clinical prenatal examination, this paper uses 20-min data segments for analysis. The preprocessed data are segmented into 20-min time segments to obtain multicomponent segment data. The waveform obtained using the above preprocessing method is shown in Figure 2, where (a) is the raw data of the FHR signal, (b) is the waveform after preprocessing using the above method, and (c) is the segment after splitting the data into multiple segments with a 20-min data length.

Construction of feature maps based on wavelet packet coefficients
As a non-stationary and non-linear time series, FHR contains complex physiological and pathological information. Wavelet packet decomposition is a discrete analysis method of non-stationary signals that can select the appropriate spectral band according to the signal characteristics and improve the time-frequency analysis resolution (Behera and Jahan, 2012). In this paper, wavelet packet decomposition is introduced to construct the wavelet packet coefficient matrix using different subspace coefficients to convert the 1D FHR signal into a 2D wavelet packet coefficient feature map. The feature map is used as the input layer data for the deep network model. Figure 3A shows the wavelet packet coefficient matrix construction process. The signal is decomposed into corresponding frequency bands through different layers, and each frequency band has a series of wavelet packet coefficients. For the nth layer decomposition, the wavelet packet transform provides 2 n different subspaces, and each subspace corresponds to a frequency band.
Wavelet packet decomposition can be implemented using a series of convolutions with high-pass filters and low-pass filters. The high-pass filter h(·) and low-pass filter g(·) can be defined as Eqs 1, 2.
where ϕ(t) is the scale function, ψ(t) is the wavelet function, 〈·, ·〉 represents the inner product, and t and k are variables. h(·) and g(·) satisfy Eq. 3.  Frontiers in Physiology frontiersin.org 04 The wavelet coefficients at different frequency bands and decomposition layers can be calculated iteratively by the following equation.
To increase the number of datasets to obtain better model effects, db1~db5 wavelet basis functions are selected for wavelet packet coefficient decomposition in this paper. Therefore, five wavelet packet coefficient matrix maps can be obtained for each data segment to enhance the dataset. Meanwhile, each wavelet packet matrix coefficient map is resized to 224*224*3 pixels as the input layer of the neural network model. The feature map construction based on wavelet packet coefficients is shown in Figure 3B. Each FHR signal segment is converted into a total of 5 feature maps based on db1~db5 wavelet bases.

LW-FHRNet network structure
To meet the application of deep neural networks on embedded and mobile terminals and maintain excellent performance, lightweight network models have emerged. In particular, the lightweight models of the MobileNet series and the ShuffleNet series are the most widely used. Depthwise separable convolution, pointwise convolution, group convolution, channel shuffle and channel separation are used to reduce the number of model parameters and speed up the model computation time.
Recently, the channel attention mechanism has been shown to have great potential in improving the performance of deep convolutional neural networks. By assigning different weights to each part of the input, more important information can be extracted to help the model make more accurate judgments without imposing greater overhead on the model's computation and storage.
Inspired by the above work, a lightweight network based on a cross-channel attention mechanism, LW-FHRNet, is proposed in this work to assist in the diagnosis of fetal distress symptoms, as shown in Figure 4. The main structure of the network contains two stages and a total of four ECA-Shuffle units. First, the feature maps based on wavelet packet coefficients are used as the input layer of the model. Subsequently, the image is conventionally convolved and the size of the output feature matrix is reduced to 1/4 of the input image using the maximum pooling operation. Then, feature extraction is performed by 4 ECA-Shuffle units to fully learn the feature unit information. Finally, regular convolution and average pooling are performed, and the output features are sent to the fully connected layer for classification.
Based on the ShuffleNet-V2 units, this study constructs two types of ECA-Shuffle units by integrating the cross-channel attention module without dimensionality reduction, as shown in Figure 5. Figure 5A (Unit A) shows the first unit of each stage. The stride of the depthwise separable convolution in both the residual branch and the identity branch of the bottleneck structure is 2, and the two output feature matrices are concatenated to 2 times their depth. The ECA strategy is used at the tail of the structure. Figure 5B (Unit B) shows the second unit of each stage. The input feature matrix is divided equally into two groups. The main branch performs a depthwise separable convolution with a stride of 1, while the other branch is left unprocessed and connected to the main branch via concat, and the feature matrix depth is kept constant. The ECA strategy is also used at the end of the structure.
The lower half of the ECA-Shuffle unit is the cross-channel interactive attention module without dimensionality reduction. The detailed structure is shown in Figure 6. Given the aggregated feature y ∈ R C without dimensionality reduction, channel attention can be learned by Eq. 6. ω σ Wy Frontiers in Physiology frontiersin.org  If the weight of y i is calculated by only considering the interaction between y i and its k neighbors and all channels share the same learning parameters, Eq. 6 can be written as Eq. 7.
where Ω k i indicates the set of k adjacent channels of y i . This strategy can be easily implemented by a fast 1D convolution with kernel size k, i.e., ω σ C1D k y where C1D denotes 1D convolution.
Considering each channel and its k nearest neighbors, computing local cross-channel interaction information instead of all channels effectively improves computational efficiency. This efficient channel attention calculation can be quickly implemented by 1D convolution. Thus, k is the key parameter and the size of the convolution kernel of the 1D convolution, which determines the range and convergence of the local cross-channel interaction.
To avoid resource-consuming cross-validation adjustment, an adaptive method is used to select the appropriate k value. According to the properties of group convolution, the high-dimensional (lowdimensional) channels are proportional to the long-distance (shortdistance) convolution for a fixed number of groups. Similarly, the coverage of the interaction (i.e., the size k of the 1D convolution kernel) is proportional to the channel dimension C. The mapping relationship between k and C is shown in Eq. 9.
Since the channel dimension is generally an exponential multiple of 2, the non-linear mapping relationship is represented by an exponential function with a base of 2. Thus, Eq. 9 can be rewritten as Eq. 10.
Consequently, the size k of the convolution kernel can be calculated automatically based on the number of channels C, which is given by Eq. 11.
where |t| odd represents the nearest odd number of t. To reduce the computational cost and training time, γ and b are empirically set to 2 and 1, respectively. The details of the lightweight network: LW-FHRNet structure designed in this work are shown in Table 1. The first operation of each stage is the ECA-Shuffle unit A, which realizes the doubling of feature dimensions, followed by the ECA-Shuffle unit B, which realizes the subsequent operations.
The process of the fetal distress classification algorithm based on a lightweight network is described in Table 2. After preprocessing and 20-min length segmentation, the dataset is randomly divided into a training set and a testing set in proportion. Each segment is subjected to wavelet packet decomposition based on db1 to db5 wavelet basis functions to obtain five feature maps. Iterative testing of model tuning is performed with the training set data to obtain the optimal model. The testing set is subjected to category prediction under the optimal model, and the final category attribution is decided by voting on the five feature maps of each data segment.

Dataset
The database in this paper uses the publicly available dataset CTU-UHB, which comes from the Czech Technical University in Prague (CTU) and the University Hospital in Brno (UHB) (Chudacek et al., 2014). A total of 552 CTG records were collected in the database. These records were carefully selected from 9,164 records collected by UHB from 2010 to 2012. The sampling rate of CTG data is 4 Hz, and each CTG record contains FHR sequences and UC sequences. The records in the database were all singleton gestations, all gestational ages greater than 36 weeks and no known congenital developmental defects. The quality of the FHR signal was greater than 50% in every 30-min The normalization and ReLU, layers that follow each convolutional layer are not shown above because they do not change the output feature shape. Conv: convolutional layer; MaxPool: max pooling layer; AvgPool: average pooling layer; FC: fully connect layer; stage: ECA-Shuffle uint A+ ECA-Shuffle uint B.
Frontiers in Physiology frontiersin.org window. Available biochemical parameters of the umbilical artery blood sample (pH) were recorded for each sample. The pH value is a marker of blood acid-base balance and can provide information on possible fetal acidosis caused by intrauterine hypoxia. A lower pH value represents a more severe degree of fetal acidosis (Vayssiere et al., 2007). showed moderate ability to detect mild acidosis at pH ≤ 7.15 and better ability to detect more severe acidosis at pH ≤ 7.05. Therefore, in this paper, pH = 7.05 was chosen as the criterion to classify the data into two categories. Data with a pH value greater than 7.05 are considered normal, and data with a pH value less than or equal to 7.05 are considered abnormal. Based on this discriminant, 44 abnormal samples and 508 normal samples are obtained (Ito et al., 2022). predicted fetal acidemia by calculating iPREFACE (10), iPREFACE (30) and iPREFACE (60) at 10, 30, and 60 min before delivery. The results showed that iPREFACE (30) was slightly better than iPREFACE (60) but significantly better than iPREFACE (10). To enhance the sample size, a 20-min segmentation is performed after preprocessing the 60-min data before delivery. After splitting the samples into 20-min data segments, 106 abnormal sample segments are obtained. To avoid the effect of overfitting or underfitting caused by category imbalance on the classification results, 106 samples from 512 normal samples are randomly selected. The second 20-min segment is selected to construct 106 normal sample segments for the experiment. Eighty percent of the dataset is randomly selected as the training set (85P and 85N), and the remaining 20% as the test set (21P and 21N). The wavelet packet decomposition from the db1 to db5 wavelet basis is performed separately for each FHR data segment, which constitutes 5 wavelet packet coefficient matrix feature maps. Therefore, there are 850 images in the training set and 210 images in the test set.
In this paper, each 20-min segment of FHR data is subjected to wavelet packet decomposition based on db1 to db5 wavelet basis functions to obtain five wavelet coefficient matrix feature maps. Category attribution is determined by voting on the 5 feature maps. The category voting process is shown in Figure 7. First, each feature map of the segment is classified. Subsequently, the frequency of each category label is calculated for the segment. Finally, the class with higher frequency is selected as the category of this FHR segment.

Environment
The network structure proposed in this paper is trained and tested on the CTU-UHB dataset. The experimental platform is a computer equipped with an Intel Xeon(R) CPU E3-1535M v6 @

FIGURE 7
An example of the category voting process. Notes: P: Positive; N: Negative.
Frontiers in Physiology frontiersin.org 08 3.10 GHz x 8, Quadro P5000 GPU and 32 G RAM. The system is Ubuntu 18.04.6LTS, the development environment is TensorFlow 2.6.2, and the language used is Python.

Metrics
To evaluate the classification performance of the model, accuracy, precision, recall and F1-Score metrics are used in this paper. Additionally, model parameters and model size are introduced to evaluate the complexity of lightweight models. Finally, sensitivity (Se) and specificity (Sp) are used to observe the discriminatory ability of the model between abnormal and normal samples.

Baselines
The commonly used lightweight networks MobileNetV3-Small, MobileNetV3-Large and ShuffleNet-V2 are introduced as the baselines of this research. MobileNetV3 introduces the channel attention module based on MobileNetV2 to enhance the adaptive capability of the model by assigning different weights to different channels. MobileNetV3 has two versions: small and large. ShuffleNet-V2 proposes the concept of channel separation to replace group convolution to further improve the inference speed.

Experiment 1: Selection of wavelet packet decomposition layers
Wavelet packet decomposition with different numbers of layers can obtain different detailed information. The sampling frequency of the raw data is 4 Hz. The ith layer is decomposed to obtain 2 i frequency bands. The 2D image is constructed according to the frequency from the highest to the lowest. The frequency range of the jth frequency band is ( 4 2 i (j − 1)~4 2 i j) Hz, j ∈ [1, 2 i ]。 To select the best wavelet coefficient matrix feature map, this paper performs wavelet packet 1-layer to 5-layer decomposition to obtain the wavelet packet coefficient matrix maps of corresponding layers to test the classification performance. The experimental results are shown in Table 3. The accuracy of the 2-layer and 3-layer decomposition is higher, and the accuracy of the 4-layer and 5layer decomposition gradually decreases. The 2-layer decomposition achieves optimal performance with 95.24% accuracy, 100% precision, 90.48% recall and a 95.00% F1-score. Therefore, the feature map based on 2-layer wavelet packet decomposition is chosen as the input of the model in this paper. That is, the signal is decomposed into four frequency bands:0-1 Hz, 1-2 Hz, 2-3 Hz and 3-4 Hz. And the wavelet packet coefficients in the corresponding frequency bands are used to jointly construct the feature maps.

Experiment 2: The effective role of local cross-channel interactive attention mechanisms
The channel attention mechanism has great potential to improve the performance of deep convolutional neural networks. In this paper, we introduce a cross-channel local interaction The bold values means the best performance. Frontiers in Physiology frontiersin.org attention strategy without dimensionality reduction to improve the performance of lightweight models. Experiments are conducted on the dataset of this paper using a lightweight network with and without an ECA module. The confusion matrix of whether the proposed lightweight model contains ECA modules is shown in Figure 8. Table 4 shows the model performance comparison with and without the ECA module. The lightweight model accuracy with the ECA module is as high as 95.24%, and the accuracy of the lightweight model without the ECA module is 92.86%. The experimental results show that the lightweight model with the ECA module improves performance in fetal distress classification.

Experiment 3: Lightweight model comparison experiment
To clarify the performance of the network, this paper performs a comparative test with different lightweight networks. The classification performance of fetal distress under different lightweight networks is measured using accuracy, precision, recall, F1-score and model size metrics. The test performance comparison of the LW-FHRNet network with other commonly used lightweight networks is shown in Table 5. MobileNetV3 improves MobileNetV2 by using a deep separable convolution +SE channel attention mechanism + residual structure connection to further reduce the computational effort. The overall structure of small and large is the same, and the difference is the number of bnecks and channels. MobileNetV3-Small achieves 85.71% accuracy, proving that the network has a strong feature learning capability. MobileNetV3-Large has better accuracy than MobileNetV3-Small, but the number of network parameters has increased significantly due to the increase in the number of bnecks and channels. The ShuffleNet-V2 network improves the ShuffleNet-V1 network architecture in terms of optimizing memory access cost (MAC), reducing network fragmentation, and decreasing element operations. Due to the small number of parameters in the ShuffleNet-V2 model, it performs poorly in terms of accuracy, with only 83.33%. Due to the low number of parameters in the ShuffleNet-V2 model, its performance is relatively poor, with an accuracy of 83.33%.
LW-FHRNet incorporates an efficient cross-channel attention mechanism without downscaling on the base unit of ShuffleNet-V2. The channel interaction strategy effectively improves the performance of channel attention and enables LW-FHRNet to have a more accurate recognition performance. The ROC curves of LW-FHRNet and other commonly used lightweight network models are shown in Figure 9A. The proposed network in this paper has the best performance with 97.96% AUC. A comparison of the accuracy and model size of LW-FHRNet with other commonly used lightweight networks for fetal distress classification is shown in Figure 9B. LW-FHRNet achieves 95.24% accuracy for fetal distress classification, which is higher than other commonly used lightweight networks. Additionally, it has the lowest computational cost, and the number of network parameters is only 0.33 M, which is much lower than other commonly used lightweight networks.

Discussion
In this paper, a lightweight network based on cross-channel interactive attention mechanism is proposed to effectively fuse channel features and reduce model complexity to help obstetricians to objectively assess fetal distress. In the experiments, the classification effects of wavelet packet decomposition with different layers as feature maps were first compared. And the optimal number of wavelet packet decomposition layers was chosen as 2-layer. Then two different network architectures (LW-FHRNet and LW-FHRNet-without-eca) were used. The results showed that the attention machine module effectively improves the classification performance of fetal distress. Finally, a comparison with other lightweight models was made to show that the lightweight network proposed in this paper outperforms other common lightweight networks.
To analyze the significance of the results, the algorithm in this paper is compared with recent related work in the diagnosis of fetal distress using the CTU-UHB database. The results are shown in Table 6, which measures the performance of this research work in terms of accuracy (Acc), sensitivity (Se) and specificity (Sp). Compared with (Zarmehri et al., 2019), the method of this paper has higher Se and Sp under the same fetal distress division criteria, which further highlights the advantages of our model. Compared with (Alsaggaf et al., 2020), they also have good classification accuracy, but they use the traditional machine learning The bold values means the best performance. The bold values means the best performance.
Frontiers in Physiology frontiersin.org classification method, which requires manual design to extract a large number of features. The feature extraction process is complex and computationally intensive. Compared with (Baghel et al., 2022), they have higher accuracy than the model in this paper, but they use regular CNN convolution for feature extraction. The parameter number and computational time still need to be improved and optimized for end-application deployment.
In conclusion, the lightweight network model based on the cross-channel interactive attention mechanism proposed in this paper achieves better classification results in fetal distress diagnosis. The ShuffleNet-V2 unit combined with the local crosschannel interactive attention mechanism is used to build a lightweight network, which ensures a low number of parameters and achieves effective network performance improvement.
However, one limitation of the study in this paper is the criteria for discriminating between normal and distressed samples. The current work generally endorses the use of umbilical artery blood pH as a criterion for classification, since pH is an objective response to the fetal oxygen cell supply (Zarmehri et al., 2019) and also to the severity of fetal acidosis (Vayssiere et al., 2007). However, as shown in Table 6, a variety of pH values were used in different research works. There is not yet a universally accepted pH value. In future research work, the study will focus on exploring the pH value of pathological samples. Meanwhile, the BDecf index can reflect the degree of fetal acidosis (Liu et al., 2021). Therefore, a more precise classification of fetal distress can be performed by combining pH and BDecf in subsequent studies.

Conclusion
In this work, a lightweight network (LW-FHRNet) based on ECA-Shuffle units is proposed for fetal distress classification of FHR signals. After preprocessing, the FHR signal is segmented into 20-  Frontiers in Physiology frontiersin.org min segments, and the wavelet packet decomposition operation based on db1 to db5 wavelet basis functions is performed on each segment. Each segment obtains five wavelet packet coefficient matrix feature maps, which are used as input to the model and vote on the classification result. The ECA-Shuffle unit performs feature extraction on the feature map to fully learn the feature information. We integrate an efficient local cross-channel interactive attention mechanism without dimensionality reduction to reduce model complexity and ensure performance improvement. In this paper, the CTU-UHB open source database is used to test the classification performance of the proposed network. A pH value of 7.05 was used as the gold standard for classification. The proposed algorithmic model achieves excellent results of 95.24%, 90.48%, and 100% for Acc, Se and Sp, respectively. Although the proposed lightweight network achieved good results in classifying fetal distress, there is still a gap to reach the clinical diagnosis level of physicians. In order to achieve better auxiliary diagnosis, we will do further exploration in future work. On the one hand, the data from clinical fetal heart monitoring contain simultaneous UC signals and FHR signals, but only FHR signals are used to assess fetal distress because of the poor quality of UC signals in publicly available datasets. In the clinic, the UC signal is also an important basis for physicians to diagnose fetal distress. Therefore, the combination of FHR signals and UC signals needs to be considered in further studies. On the other hand, we are considering more time-frequency transform features to improve the classification performance for fetal distress, including Empirical Wavelet Transform, Hilbert-Huang Transform, Singular Spectrum Analysis, etc.

Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://www.physionet.org/content/ctu-uhb-ctgdb/1.0.0/.