Comparison of fetal heart rate baseline estimation by the cardiotocograph network and clinicians: a multidatabase retrospective assessment study

Background This study aims to compare the fetal heart rate (FHR) baseline predicted by the cardiotocograph network (CTGNet) with that estimated by clinicians. Material and methods A total of 1,267 FHR recordings acquired with different electrical fetal monitors (EFM) were collected from five datasets: 84 FHR recordings acquired with F15 EFM (Edan, Shenzhen, China) from the Guangzhou Women and Children's Medical Center, 331 FHR recordings acquired with SRF618B5 EFM (Sanrui, Guangzhou, China), 234 FHR recordings acquired with F3 EFM (Lian-Med, Guangzhou, China) from the NanFang Hospital of Southen Medical University, 552 cardiotocographys (CTG) recorded using STAN S21 and S31 (Neoventa Medical, Mölndal, Sweden) and Avalon FM40 and FM50 (Philips Healthcare, Amsterdam, The Netherlands) from the University Hospital in Brno, Czech Republic, and 66 FHR recordings acquired using Avalon FM50 fetal monitor (Philips Healthcare, Amsterdam, The Netherlands) at St Vincent de Paul Hospital (Lille, France). Each FHR baseline was estimated by clinicians and CTGNet, respectively. And agreement between CTGNet and clinicians was evaluated using the kappa statistics, intra-class correlation coefficient, and the limits of agreement. Results The number of differences <3 beats per minute (bpm), 3-5 bpm, 5–10 bpm and ≥10 bpm, is 64.88%, 15.94%, 14.44% and 4.74%, respectively. Kappa statistics and intra-class correlation coefficient are 0.873 and 0.969, respectively. Limits of agreement are −6.81 and 7.48 (mean difference: 0.36 and standard deviation: 3.64). Conclusion An excellent agreement was found between CTGNet and clinicians in the baseline estimation from FHR recordings with different signal loss rates.


Introduction
Although deaths in children have declined substantially in the past 30 years, more than 5 million still die every year (1). Electronic fetal heart rate (FHR) monitoring was introduced to detect fetuses' pathological states as early as possible in the obstetrics practice in the late 1950s. However, the misinterpretation and ambiguity of FHR patterns may increase unnecessary interventions, such as operative deliveries and cesarean sections (2)(3)(4). Different guidelines over the past decades have recommended some modifications for interpreting FHR tracings, but beliefs in the etiology of basic FHR patterns (including the baseline, the variability, accelerations, decelerations, and sinusoidal patterns) have remained essentially unchanged (5,6). In these FHR patterns, the baseline is a pre-requisite for evaluating the other patterns (7). Gynecologists and obstetricians usually estimated the baseline by visual analysis, but the unreliability of visual interpretation with a high degree of inter-and intra-observer variability is found (7)(8)(9)(10)(11)(12). Therefore, computer-assisted analysis has been sought to mitigate the variability of visual explanation (13)(14)(15)(16)(17).
Several studies have evaluated the performance of different computer-assisted methods. For example, in 2016, Jezewski et al. evaluated 11 different algorithms using two inconsistency coefficients based on three properties (i.e., number, location and area) of accelerations/accelerations (18). They found that the algorithm of Arduini et al. (9,19) outperforms other methods by achieving the lowest mean inconsistency coefficients on a private dataset with 41 FHR signals. This nonlinear filtering method is similar to the algorithm proposed by Mantel et al. (20). The difference is that Arduini's baseline is computed in 10 min windows with 5 min shift, whereas Mantel's baseline is calculated for the whole FHR tracing. Considering Mantel's method, Houzé de l'Aulnoit et al. further evaluated 11 newer algorithms by comparing the computed baselines with that estimated by clinicians on a dataset with 90 FHR signals (13). This study found that Lu and Wei's algorithm (14) achieves better results than other methods by achieving a new morphological analysis discriminant index (MADI) of 7.3%. Recently, a weighted median filter was proposed by Boudet et al. to compute the FHR baseline, and more agreement (represented by a MADI of 4.0%) with clinicians' consensus than Lu and Wei's method was shown on this dataset with 90 FHR recordings (15). Similar to Lu and Wei's method, an algorithm for the baseline estimation based on singular spectrum analysis and empirical mode decomposition was also proposed by Lu et al. (16) and evaluated on another public dataset with 552 FHR recordings. This method also was objectively evaluated on the dataset with 90 FHR recordings (13) by achieving a MADI of 15.6%. Unlike signal processing methods, the CTGNet based on deep learning was proposed in our previous study (21) and evaluated on a larger dataset with 234 FHR recordings. This method was compared with 12 signal processing methods and the lowest metrics (including the root-mean-squared difference between baselines and MADI) were obtained. These methodological studies illustrate the excellent performance of the CTGNet. However, its clinical application still requires a comparative study with large-scale multicenter data.
To evaluate the clinical usability of the CTGNet, we compare the FHR baseline predicted by the CTGNet with that estimated by clinicians using a large dataset with 1,267 FHR recordings acquired with fetal monitors of five device manufacturers.  (24)(25)(26). In these datasets, the signal loss rate of FHR recordings from GMU_DB, JNU_DB and SMU_DB is <10% per 10 min, whereas those from GMU_DB and UHB_DB are <7% and <50%, respectively ( Figure 1).
The Medical Ethics Committees of the Guangzhou Women and Children's Medical Center (273A01), the Jinan University (JNUKY-2022-018) and the NanFang Hospital of Southen Medical University (NFEC-2019-024) approved this retrospective study.

Methods
According to the baseline definition of the FIGO consensus guideline (5): (1) the baseline is estimated as the mean level of the most horizontal and less oscillatory FHR segments of 10 min; (2) It is necessary to review previous and subsequent 10 min sections to estimate the baseline in recordings with unstable FHR signals. Clinicians (Z.Z. and X.P.) with more than seven years of Bai et al. 10.3389/fcvm.2023.1059211 Frontiers in Cardiovascular Medicine experience in CTG analysis independently assessed the baselines of FHR tracings. In order to obtain a consistent baseline, FHR recordings were re-evaluated when the differences between clinicians exceeded three bpm, and the baseline was determined as an average of these clinicians' estimations when their difference was less than three bpm. A difference between the baseline estimated by clinicians and that predicted with the CTGNet were was then computed to evaluate their agreement.

Statistical analysis
For each FHR recording, baseline values estimated by CTGNet and the consensus of clinicians were attributed to 5 bpm classes (such as class 0: ≤100 bpm, class 1: 100 <baseline ≤105, and class 2: 105 <baseline ≤110) in the following manner (7): (1) when the baseline difference does not exceed five bpm, CTGNet's and clinicians' baselines are assigned to the same class according to their mean (e.g., if CTGNet's baseline value is 109 and clinicians' estimation is 113, both values are assigned to the class 110-115); (2) when the baseline difference exceeds five bpm, baseline values are assigned to their respective classes. Kappa and intra-class correlation (ICC) coefficient values (i.e., excellent agreement: >0.75, good agreement: 0.4-0.75 and poor agreement: <0.4) were calculated to evaluate agreement in the baseline estimation and 95% confidence intervals (95% CI) were computed for all results. Table 1 summarizes the comparisons of the baselines predicted by the CTGNet and those estimated by clinicians. In 99% of FHR recordings from GMU_DB, JNU_DB and SMU_DB, differences do Flow chart of the FHR baseline estimation.

Discussion
Reliable FHR interpretation is the base of the fetal state assessment. The poor recognition performance of FHR patterns can propagate the error to subsequent steps, thereby decreasing classification accuracy. In all these FHR patterns, the baseline is a precondition for evaluating of the other patterns. Visual estimation of the FHR baseline is subject to inter-and intra-observer variability. Computer-assisted baseline estimation has been proposed as a promising way to reduce this variability. In order to evaluate the performance of our computer-assisted method (i.e., CTGNet), a comparison of FHR baseline estimation by the CTGNet and a consensus of clinicians presents in this study. Baselines assigned by computer-assisted methods had been compared with those estimated by clinicians in several studies (7)(8)(9)(10)(11)(12). In the studies of Arduini et al. and Ayres-de-Campo et al., limits of agreement (LoA) were no more than −6.45 and 7.07 in ≤150 FHR tracings. In addition, Kappa value and ICC coefficient also were used to evaluate agreement in baseline determination between a computer-assisted method and several experts in previous studies. Obtained Kappa values varied from 0.18 to 0.97, while the ICC coefficient was within the range of 0.83-0.98 ( Table 2). All these results were obtained on several small datasets with ≤150 FHR tracings acquired with EFM from ≤3 different manufacturers. In the present study, a larger dataset with 1,267 FHR tracings acquired with EFM from 5 different manufacturers was used to evaluate agreement in baseline determination between the CTGNet and clinicians. This dataset included ∼50% high-quality tracings with a signal loss rate of <10% per 10 min and ∼50% low-quality recordings with a signal loss rate of <50% per 30 min. Kappa values were >0.98 for these high-quality tracings from GMU_DB, JNU_DB and SMU_DB, while the Kappa value was 0.771 for the lowest-quality FHR recordings (n = 552) from UHB_DB. Regardless, an excellent agreement was obtained on the whole dataset (n = 1,276). These results indicate possibilities for the clinical application of CTGNet in FHR baseline estimation.
The high loss rate of FHR tracings severely affected the performance of CTGNet. In the present study, the Kappa value   (5), low-quality tracings with a mean signal loss of 28%-55% are shown in clinical practice (27,28). For example, the mean signal loss of 13% and 30% were found during the first and the second stage of labor, respectively (29). Therefore, the CTGNet can be further trained on datasets with low-quality tracings to improve its robustness. On these five datasets, we further compared the performance of our method (30) with existing signal processing methods. The method proposed by Mantal et al. achieved the best performance on the GMU_DB and JNU_DB datasets. Our method achieved the best results on the JNU_DB and LCU_DB datasets. The method proposed by Boudet et al. achieved the best results on the UHB_DB dataset. In the comprehensive evaluation of the five datasets, Boudet's method and our method ranked first and second respectively, and their baseline differences were both less than 3 bpm ( Table 3).

Conclusions
The CTGNet for the FHR baseline estimation provided an excellent agreement with clinicians. However, this occurs in FHR recordings with low and medium signal loss rates. In the future, the CTGNet can be further improved by training it with more low-quality tracings.

Data availability statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Ethics statement
The Medical Ethics Committees of the Guangzhou Women and Children's Medical Center (273A01), the Jinan University (JNUKY-2022-018) and the NanFang Hospital of Southen Medical University (NFEC-2019-024) approved this study. Written informed consent was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Author contributions
Conceptualization, JB, XP, YL, ZZ, and HW; writing-original draft preparation, JB and XP; writing-review and editing, JB, ZZ, and XP; visualization, JB; funding acquisition, HW, JB, and YL; data collection, ZZ, YL, XG, and JB; data annotation, XP and ZZ. All authors contributed to the article and approved the submitted version.