Extract fetal ECG from single-lead abdominal ECG by de-shape short time Fourier transform and nonlocal median

The multiple fundamental frequency detection problem and the source separation problem from a single-channel signal containing multiple oscillatory components and a nonstationary noise are both challenging tasks. To extract the fetal electrocardiogram (ECG) from a single-lead maternal abdominal ECG, we face both challenges. In this paper, we propose a novel method to extract the fetal ECG signal from the single channel maternal abdominal ECG signal, without any additional measurement. The algorithm is composed of three main ingredients. First, the maternal and fetal heart rates are estimated by the de-shape short time Fourier transform, which is a recently proposed nonlinear time-frequency analysis technique; second, the beat tracking technique is applied to accurately obtain the maternal and fetal R peaks; third, the maternal and fetal ECG waveforms are established by the nonlocal median. The algorithm is evaluated on a simulated fetal ECG signal database ({\em fecgsyn} database), and tested on two real databases with the annotation provided by experts ({\em adfecgdb} database and {\em CinC2013} database). In general, the algorithm could be applied to solve other detection and source separation problems, and reconstruct the time-varying wave-shape function of each oscillatory component.


INTRODUCTION
Electrocardiograph (ECG) is inarguably the most widely applied measurement to noninvasively study the cardiac activity, since its appearance in 1901 [3]. Its waveform provides a significant amount of clinical information. In addition, the time-varying speed of heart beating, widely understood as the heart rate variability (HRV), has proved to be a portal to our physiological dynamical status. While it has been that widely applied in different scenarios, its application to the intra-uterus fetus is still limited, mainly due to the lack of a direct contact measurement of the fetal ECG (fECG) signal. Like the adult ECG signal processing, there are two main purposes in the fECG signal processing. First, we want to non-invasively obtain the fetal heart rate, which is intimately related to the fetal distress [35]; second, we would like to analyze the fECG morphology for the sake of diagnosing cardiac problems. However, the fECG morphological analysis is less performed in clinics, except for the ST analysis (STAN) monitor, which detects and alerts the potential risk for fetal hypoxia (see, e.g. [9] and the citations therein).
There are mainly two types of fECG signals. The first kind of signal is directly recorded through an electrode attached to the fetal skin. For example, the electrode could be attached to the scalp while the cervix dilates during delivery, which is considered invasive. While the recorded signal is of high quality, it can only be recorded during a specific and short period and the instrument is not designed for the long-term monitoring purpose. Also, the infection risk is not negligible. So it is not routinely used in clinics. We call it the direct fECG signal, and we mention that the STAN monitor depends on the direct fECG signal. The second kind of signal is recorded from the mother's abdomen, where the sensor is close to the fetus so that the fECG signal is big enough compared with the maternal 1 the dependence on multiple leads render it less applicable for the screening and monitoring purpose; on the other hand, the algorithms depending on the single-lead aECG signal usually have lower accuracy, while they could be applied to a wider range of situations. To simultaneously fulfill the practical need and the accuracy, in this paper, we propose a novel algorithm to extract fetal instantaneous heart rate (fIHR) and the fECG signal from the single-lead aECG signal from a different viewpoint. The proposed algorithm combines a recently developed nonlinear time-frequency (TF) analysis called the de-shape short time Fourier transform (de-shape STFT), and the nonlocal median; the de-shape STFT extracts the maternal instantaneous heart rate (mIHR) from the single-lead aECG, which provides the maternal R peak information. The maECG is then extracted from the aECG by the nonlocal median algorithm. The difference between the aECG and estimated maECG serves as a rough fECG estimate, and the fIHR could be estimated from the rough estimate of fECG by de-shape STFT, and hence the fetal R peaks. The fECG is then extracted by the nonlocal median algorithm. The R peak information could be accurately estimated by the beat tracking algorithm based on the dynamic programming. While not explicitly used in the algorithm, we mention that our method has the ability to simultaneously obtain the fIHR and mIHR, and hence simultaneously the fECG and mECG.
The novelty and the main difference between our proposed method and the other algorithms based on the single-lead aECG signal are two folds. First, we use more information hidden in the single-lead aECG signal. Note that the traditional R peak detection algorithms mainly count on the morphological landmarks (fiducial points), like the maximal points representing the R peaks, or the maximal "energy" pattern driven by the QRS complex in the TF domain determined by, for example, the wavelet transform. Then the mIHR and fIHR are obtained by interpolating the estimated R peak locations. On the other hand, de-shape STFT allows us to directly extract the mIHR and fIHR from the singlelead aECG signal, and the fIHR from the rough fECG estimate. We could then utilize the mIHR and fIHR to guide an accurate R peak detection. Unlike the traditional approach, we simultaneously use the frequency information (the mIHR and fIHR), which reflects the time-varying and nonlinear beat-to-beat relationship, and the morphological landmark information. Second, based on the nonlinear manifold model, we apply the nonlocal median algorithm [15,45] to extract the maECG and fECG signals -for each cardiac activity candidate, we only consider those aECG segments with a similar pattern, and use the median to estimate the underlying cardiac activity. Compared with the traditional TS methods [36], where the mean, or the mean together with the first few principal components, of consecutive aECG segments containing cardiac activities is considered to be the template of the cardiac activity, the nonlocal median algorithm takes care of the following commonly encountered issues. The fact that the QRST complex morphology is time-varying [48] might be overlooked in the traditional TS procedure; the mean of consecutive aECG segments containing cardiac activities is well known to be sensitive to outliers; the TS algorithm is sensitive to the number of principal components and an empirical optimization is needed [18]. In sum, the de-shape STFT is applied to get a better mIHR and fIHR and hence maternal and fetal R peaks, and the nonlocal median is applied to get a better mECG and fECG.
The paper is organized in the following. In Section 2, we discuss a phenomenological model for the aECG and the mathematical background for the de-shape STFT and nonlocal median. In Section 3, the single-lead fECG extraction algorithm is introduced. The material and results are reported in Section 4. In Section 5, the paper is summarized by a discussion, including limitations and future works.

MATHEMATICAL BACKGROUND
The aECG signal contains at least two components of interest: the component associated with the maternal cardiac activity and the component associated with the fetal cardiac activity, which have different time-varying frequencies and different non-sinusoidal oscillations. These lead to a wide spectrum, so the usual linear signal processing techniques do not work. While it is challenging enough to separate these two components, the problem becomes more challenging considering the influence of different kinds of noise in the measurement. Furthermore, due to the physiological nature of the ECG signal, the nonsinusoidal oscillation is not just impulse-like but also time-varying (see, for example, the nonlinear relationship between QT and RR intervals [48]); that is, we will run into the time-varying wave-shape function issue. In this section, we provide a phenomenological model suitable for the aECG signal and an algorithm suitable for analyzing this kind of signal. Second, we provide a low dimensional and nonlinear geometric model to describe the maternal and fetal cardiac activities. Based on this nonlinear model, the nonlocal median algorithm is introduced to reconstruct the time-varying wave-shape function, and hence extract the mECG and fECG from the aECG.

2.1.
Adaptive non-harmonic model and de-shape short time Fourier transform. To extract the fECG information from the aECG signal, we propose to apply the adaptive non-harmonic model to model the aECG signal, and apply the de-shape STFT to study the aECG.
We call a function x in D c,C,N ε an adaptive non-harmonic (ANH) function, where the adjective non-harmonic indicates the possibly non-sinusoidal nature of the oscillation, and the adjective adaptive indicates the time-varying nature of the frequency, amplitude, and the non-sinusoidal oscillatory pattern. We call B 1 (t) cos(2πφ 1 (t)) the fundamental component, B 1 (t) the fundamental amplitude, φ 1 the phase function, and φ 1 the fundamental instantaneous frequency (IF) of the signal x. By a slight abuse of terminology, for > 1, we call B cos(2πφ (t)) the -th multiple of the fundamental component, which we simply call the -th multiple if no danger of confusion is possible, B (t) the amplitude of the -th multiple, and φ the IF of the -th multiple, although φ might not be an exact integral multiple of φ 1 . Thus, we could view an ANH function as an oscillatory component with the time-varying amplitude, frequency, and wave-shape function.
A special case deserves a discussion. When β := B (t) B 1 (t) are constants for all = 0, 1, . . . , ∞ and φ (t) = φ 1 (t) + α for some α ∈ R for all = 1, . . . , ∞, (1) is reduced to the where s is a 1-periodic function with the Fourier coefficients determined by β and α . In this case, clearly the phase function φ 1 is linearly related to the deformation of the nonsinusoidal oscillation, and we say that the wave-shape function is not time-varying. On the contrary, for an ANH function, the phase function φ 1 might be nonlinearly related to the deformation of the non-sinusoidal oscillation. If we further assume that ε = 0, then we obtain the well known harmonic function.
To motivate this model, take the relationship between the RR and QT intervals of an ECG signal as an example. The nonlinear relationship between the QT interval and the RR interval has been well studied -for example, the Fridericia's formula (QT interval is proportional to the cubic root of RR interval) or a fully nonlinear depiction [48]. Thus, the QRS complex representing the ventricular response is not linearly related to the instantaneous heart rate, and we need a time-varying wave-shape function to model this physiological fact. For more detailed discussion, we refer the reader with interest to [44].
In general, a signal might be composed of more than one oscillatory component; for example, the aECG signal is composed of the fECG and maECG. Take a small enough 0 < ε < 1 and d > 0. We consider the set D ε,d ⊂ C 1 (R)∩L ∞ (R) consisting of superposition of ANH functions; that is, for some finite K ∈ N and where the fundamental IF's of all ANH functions satisfy the following two conditions if K > 1. First, the frequency separation condition says that Second, the non-multiple condition says that for each k = 2, . . . , K, φ k,1 (t)/φ ,1 (t) is not an integer for = 1, . . . , k − 1. We say that a signal in D ε,d satisfies the ANH model.

2.1.2.
Model the maternal abdominal ECG signal by the ANH model. We now model a recorded aECG signal, x(t), by the ANH model, whichs satisfies for some c f ,C f , N f is the fECG signal, and n(t) is noise. Here, the fundamental IF of x m (respectively x f ) is the mIHR of maECG (respectively fECG). n(t) includes different kinds of noise, ranging from the baseline wandering, power line interference, maternal electromyographic signal, to uterine contraction, so in general it is not stationary. We consider either smooth varying non-stationary noise model with a smooth and slowly varying covariance function for n(t) to capture the possible heteroscedasticity and autocorrelation inside the noise [17,Equation (6)], or a more general piecewise locally stationary model [67, Definition 1] to further capture the abrupt change inside the noise structure.
In sum, both x m (t) and x f (t) are signals with time-varying IF due to the inevitable HRV with time-varying amplitude and wave-shape function representing the cardiac activity, and the recorded signal might be contaminated by different kinds of inevitable noise.
2.1.3. de-shape short-time Fourier transform. There are several challenges of analyzing the aECG signal, including the time-varying amplitude, time-varying frequency, and the time-varying non-sinusoidal oscillation. To deal with this kind of signal, under the ANH model (8), we could apply the currently proposed algorithm, the de-shape STFT. The deshape STFT is a nonlinear TF analysis technique combining the well known STFT and the cepstrum technique commonly applied in the signal processing field, and it is composed of the following four steps. First, note that x(t) is a tempered distribution, so with a chosen window function h ∈ S , where S is the Schwartz space, we have where t ∈ R indicates time and ξ ∈ R indicates frequency. Clearly, V (h) f (t, ξ ) ∈ C ∞ is smooth and slowly increasing on both time and frequency axes. Precisely, the frequency function V  f (·, ξ ) for any ξ ) and all its derivatives have at most polynomial growth at infinity. Second, evaluate the short time cepstral transform (STCT) in order to obtain the fundamental period and its multiples: where γ > 0 is sufficiently small and q ∈ R is called the quefency (its unit is second or any feasible unit in the time domain). Since V  f (t, ·)| γ is continuous and slowly increasing. Hence its Fourier transform is welldefined in the distribution sense. Third, evaluate the inverse short time cepstral transform (iSTCT), which is defined on R × R + as where ξ > 0 (its unit is Hz or any feasible unit in the frequency domain) and U (h,γ) x (t, ·) is in general a distribution. Precisely, the map ξ → 1/ξ from (0, ∞) to (0, ∞) is open and its differentiation is surjective on (0, ∞), so for a distribution C (h,γ) x (t, ·), for any t, defined on (0, ∞), C (h,γ) x (t, 1/ξ ) is well-defined [33, Theorem 6.1.2]. Fourth, we remove all the multiples by evaluating the de-shape STFT, which is defined on R × R + as where ξ > 0 is interpreted as frequency. Note that the de-shape STFT is well-defined as a distribution since V (h) The main motivation of the de-shape STFT is to decouple the IF and the non-sinusoidal wave-shape function. Due to the non-sinusoidal oscillation, at each time t, in STFT we could see not only the fundamental frequency but also its multiples. The existence of multiples, when there are more than one component, interfere with each other and mask the true information we have interest. Thus, the notion of cepstrum is applied to obtain the fundamental period information of the signal. Note that the fundamental period and its multiples, after the inversion, become the fundamental frequency and its divisions, and hence the common ingredient between iSTCT and STFT is the fundamental frequency. Thus, after a direct element-wise product, only the fundamental frequency is preserved in the de-shape STFT. This approach could be viewed as a "nonlinear filter" technique, which uses the "dual information" of the spectrum (the cepstrum), as a mask to remove the irrelevant information (the wave-shape function) and keep the relevant information (the IF).
It has been shown in [44,Theorem 3.6] that under the ANH model (6), the de-shape STFT could extract the fundamental IF of each component, where all multiple IF's are suppressed. In other words, the time-varying wave-shape information of maECG and fECG is decoupled from the IF and AM of maECG and fECG in the TF representation. The implementation of the de-shape STFT in the discrete-time domain will be discussed in Section 3.

Nonlocal median.
Nonlocal median algorithm [15,45] is a variation of the wellknown nonlocal mean algorithm in the image processing field, mainly for the purpose of image denoising [11,60]. After obtaining the fundamental IF of each component, we apply the nonlocal median algorithm to recover the time-varying wave-shape function. In this study, it allows us to extract the maECG and fECG from the aECG signal.
Without loss of generality, we assume that maECG and fECG are both with the Rs pattern in the aECG so that we could discuss the R peaks. The discussion holds for the S peaks if either maECG or fECG has the rS pattern. Take the maECG into account. Suppose the k-th cardiac activity, which could be normal or ectopic, without any other pathological arrhythmia, in the maECG starts at s (k) m ∈ R and ends at e (k) m ∈ R, and suppose the k-th R peak is located at r ) are periods where the maECG and fECG are isoelectric respectively (known as the TP interval). Thus, the ANH function x m in (8) could be written as m ]. Note that due to the timevarying nature of the non-sinusoidal oscillation, the amplitude of the fundamental component and each multiple in the ANH function is nonlinearly related to the waveform morphology. It is well known that electrophysiologically, the fECG and maECG are similar, except the heart rate [56], so the above discussion could be carried over to the fECG, x f .
m } > 0, and denote the k-th abdominal cardiac activity mixture as (14) x m is the k-th aECG segment containing the k-th maternal cardiac activities, denoted as (15) x m,m (t) = 0 otherwise, and several fetal heart beats, since normally the fetal heart rate is higher. Denote to be collection of aECG segments from x(t) and to be collection of maECG segments from the underlying maternal cardiac activities. We could thus view X m as a "noisy" collection of maECG segments X m,m , and the mission is to recover X m,m from X m . Similarly, we could define Physiologically, it is well known that while the underlying mechanism leading to the cardiac activities might be complicated [5], phenomenologically they are similar from beat to beat. There are two dominant parameters that quantify the similarity between cardiac activities -the scaling and the dilation, where the scaling reflects the respiratory activity and the dilation reflects the nonlinear relationship between the RR interval and QT interval. Also, the waveform representing the cardiac activity should be bounded and with a bounded differentiation. This fact could be summarized in the following: Assumption 2.1. The dataset X m,m is sampled from a random vector V m , where V m has the range supported on a bounded set inside C 1 ([0, l m + r m ]) with a low dimensional structure. To simplify the qualitative description "low dimensional structure", we assume a low dimensional smooth and compact manifold to quantify the range of V m .
Note that due to the fact that the maECG amplitude and frequency are both time-varying, under this model, two consecutive maternal cardiac activities might be far away in the manifold.
Similarly, we have the same assumption for the fetal cardiac activity; that is, the set X f , f is sampled from a random vector V f , with the range supported on a bounded set inside C 1 ([0, l f + r f ]) with a low dimensional structure. A critical assumption we need to apply the nonlocal median algorithm is the following This assumption essentially says that for different x (k) , while the maternal cardiac activities are similar, the fetal cardiac activities sit in random positions.
With the above setup and assumptions, we could now introduce the nonlocal median algorithm. For each x (k) m ∈ X m , find its K nearest neighbors with the L 2 norm, where K ∈ N is chosen by the user. Precisely, by ranking d k,l := x m in X m with the smallest L 2 norm. Note that we could also consider the correlation or other more sophisticated metrics, but to keep the discussion simple, we focus on the L 2 norm in this paper. Then, the k-th maternal cardiac activity is estimated by Based on Assumption 2.1, the neighbors we find have similar maternal cardiac activities while they are not neighbors in the temporal axis. On the other hand, it is well known that the median is robust to outliers. Note that under Assumption 2.2, for different x (k) , the fetal cardiac activities sit in different positions in the support of x (k) , and note that the ECG morphology associated with the ventricular activity is spiky, so the fetal cardiac activity occupies a small portion of the interval [0, l m + r m ]. As a result, the median value will faithfully reflect the maternal cardiac activity at t, and hence the recovery of the maECG.
The above procedure could be applied to extract the fECG signal from the aECG signal, if we reverse the role of the maECG and fECG, and consider X f from X f , f .

METHODOLOGY
In this section, we introduce our single-lead fECG extraction algorithm, and provide a summary of the used statistics for the evaluation.
3.1. Single-channel fetal ECG extraction algorithm. We now introduce our algorithm to extract the fECG signal from the aECG signal. The aECG signal x(t) is sampled with the sampling rate f s ∈ N during the interval from the 0-th second to the T -th second, where T > 0. Denote (19) x where N = T / f s + 1, to be the collected aECG signal. Thus we have (20) x where x m , x f , and n are the discretized maECG signal, fECG signal and noise. The algorithm is summarized in the flowchart in Figure 1. Below we detail the algorithm step by step.

3.1.1.
Step 0: Preprocessing. The first step of the proposed algorithm, as most other algorithms, is signal preprocessing. Among all the analyses, we apply the following steps. First, we remove the baseline wandering by the median filter with the window size 100 ms. If needed, the power-line interference is suppressed by a zero-phase notch filter at 50 or 60 Hz. To preserve the non-stationary nature of the signal, we do not apply any other linear filtering technique to the signal. If the signal is sampled at the frequency lower than 1000 Hz, to enhance the R peak alignment needed in the nonlocal median step, the signal is upsampled to 1000 Hz [41]. To simplify the notation, we use the same notation f s and N to denote the resulting sampling rate and the resulting number of sampling points, and denote the resulting signal as x ∈ R N . Input the signal lead aECG x 0 preprocessing get preprocessed aECG x de-shape STFT get TFR of x

Curve extraction get mIHR (and optionally fIHR)
Beat tracking get maternal R peaks nonlocal median get maECGx m and rough fECGx f ,0 := x −x m de-shape STFT get TFR ofx f ,0

Curve extraction get fIHR
Beat tracking get fetal R peaks nonlocal median get fECGx f Output fIHR, fetal R peaks, and fECG FIGURE 1. The flow chart of the proposed algorithm for extracting fECG from the signal lead aECG signal. The shown aECG signal is of 6 seconds long.

3.1.2.
Step 1: Run de-shape STFT on x to estimate the maternal instantaneous heart rate. We apply the de-shape STFT algorithm to extract the IF information of the maternal and fetal cardiac activities from the preprocessed aECG signal x. Fix the frequency resolution of the STFT by f s 2M , where M ∈ N is the number of discretization points in the frequency axis, and the quefency resolution by M f s M , where M ∈ N is the number of discretization points in the quefency axis. The numerical implementation of the de-shape STFT algorithm is summarized in Algorithm 1. Denote W x ∈ C N,M to be the de-shape STFT of x.
[OUTPUT] the de-shape STFT of x, W x , for the postprocessing.
It has been systematically reported in [44] that the fundamental frequencies of the maECG and fECG are represented as dominant curves in the TFR. This fact allows us to extract the salient fundamental IF information of each oscillatory component [44]. While the feature inside TFR determined by the de-shape STFT is suitable for several IF tracking methods, including dynamic programming (DP), dynamic Bayesian networks, adaptive filters, and others, to simplify the discussion, we apply the simple DP curve extraction algorithm [17] to track the peaks and estimate the IFs based on one assumption that the maECG is strong than the fECG. The detailed procedure is as follows: (1) Performing DP on W x to extract the most dominant component. The extracted curve, when adjusted with the frequency resolution, denoted as η m ∈ R N , represents the mIHR. Precisely, suppose at time n, the most dominant component is located at W x (n, j n ), then η m (n) = j n f s 2M .
(2) (Optional step 1: simultaneously estimate the fIHR) For every time n, multi- and N ∆ ∈ N and 0 ≤ θ m < 1 are chosen by the user, to suppress the maternal cardiac activity. This procedure makes the fECG be the predominant component in W x . Then, performing the curve extraction on W x again to extract the fIHR, denoted as η f ∈ R N . In practice, we could choose N ∆ so that N ∆ f s 2M = 0.1 Hz and θ m = 10 −4 . This optional step is not carried out in this paper. The temporal complexity of evaluating the de-shape STFT is O(NM log M). Indeed, the evaluation of STFT is local in nature, and it depends on the chosen window and the frequency resolution.

3.1.3.
Step 2: Obtain the maternal R peaks by beat tracking and dynamic programming. Note that although theoretically the IHR is related to the R peak to R peak interval (RRI) time series, in practice there is a discrepancy due to the time-varying nature of the waveshape function. To obtain the exact R peak location, we apply the beat tracking technique, which has been well studied in music signal analysis 1 .
Define the estimated maternal RRI as δ m (n) := 1/η m (n), which is the inverse of the mIHR. Our goal is to find an strictly increasing sequence We call B m the maternal beat sequence. This problem is understood as the beat tracking problem, and it is formulated as the following optimization problem [26]: and λ BT ≥ 0 is the penalty term determined by the user, which balances the influence of R peak amplitude and the estimated maternal RRI. Notice that [26] assumes δ m (n) (the inter-beat interval in the music clip; in our case, the IHR) being a constant function, which is clearly not suitable for an oscillatory signal with time-varying frequency. To address this issue, we modify the original formulation in [26] by allowing a time-varying function 1/η m (n). Note that the first term in the objective function in (21) reaches its maximal value when b i matches the R-peak location, and the second term penalizes the discrepancy between the estimated IHR and the IHR determined by the true RR interval, with the maximal value zero. The resulting beat sequenceB m = {b i } M m i=1 provides an estimate of R peaks location; that is,b i is an estimate of the i-th R peak location. In our experiments, setting λ BT between 20 and 50 turns out to yield a suitable tradeoff, and the result is not sensitive to λ BT .
While the number of possible beat sequences in x grows exponentially as N grows, the optimization problem (21) can be solved effectively by the DP. The main idea leading to the DP is that the objective function in (21) is an accumulation in time. Thus, we could divide the problem into a set of optimization subproblems, each of which optimizes the objective function up to a prescribed time step, and the solution is the combination of the optima at every time step. This is implemented by introducing two additional vectors C ∈ R N and D ∈ R N , where C(n) records the maximum of the objective function accumulated from 1 to n, where n = 1, . . . , N, and D(n) records the estimated beat position yielding this maximum at the time step of n − 1. D allows us to trace previous beat positions, and this step is called the backtracking. The whole procedure is sketched in Algorithm 2. More details of this method can be found in [26], and the source code is available in http://labrosa.ee.columbia.edu/projects/beattrack/. Algorithm 2 Beat tracking by dynamic programming.
[INPUT] the aECG signal x ∈ R N ; an estimated sequence of RR intervals δ m ; an weighted parameter λ BT ; Accumulate the objective function and store the result

3.1.4.
Step 3: Estimate the maECG morphology by the nonlocal median. With the maternal R peak location, we could extract x m from the aECG signal. Based on the physiologi- . . , M m , is long enough to cover the i-th maternal cardiac activity. Define the aECG segment x i ∈ R L m +R m +1 associated with the i-th maternal cardiac activity, where i = 1, . . . , M m and where the superscript t means taking the transpose. Note that the i-th R peak is located on where l = 1, . . . , L m + R m + 1.
Before estimating the maECG from {x , we need to take care of the possible overlapping issue. Ifx , we need to taper the overlapping regions of both segments. See Algorithm 3 for the implementation of the tapering step. To simplify the notation, we use the same notationx (i) m to denote the tapered segment. Finally, the maECG could be estimated byx m ∈ R N , where for all i = 1, . . . , M m and j = −L m , . . . , R m , and zero otherwise. In practice, we found that the result is stable when K m ranges from 20 to 60. The whole procedure is sketched in Algorithm 3.
Algorithm 3 Non-local median for maECG extraction.
[INPUT] the aECG signal x ∈ R N ; an estimated R-peak sequenceB m ; L m and R m ; the number of nearest neighbors K m ; 3.1.5. Final step: get the fIHR and obtain the fECG signal. Denotex f ,0 := x −x m to be the rough fECG estimate. The fIHR, the fetal R peaks, and fECG could be obtained by repeating Step 1, Step 2, and Step 3 on the rough fECG estimate. Precisely, the fIHR could be obtained by running the de-shape STFT onx f ,0 and another DP curve extraction on Wx f ,0 in Step 1.
[Optional step 2] Before running the DP curve extraction, we could re-weight Wx f ,0 around the band associated with the mIHR η m by a small constant θ f > 0 to reduce to the possible impact of the remaining maECG component. Precisely, set Wx f ,0 (n, j n ) to be θ f Wx f ,0 (n, j n ), where n = 1, . . . , N and j n ∈ {η m (n) − N ∆ , η m (n) − N ∆ + 1, . . . , η m (n) + N ∆ − 1, η m (n) + N ∆ }. In practice, choosing θ f = 1/10 could slightly improve the result, and we recommend to take it into account.
Denote η f ∈ R N to be the final estimated fIHR. The fetal R peak location could be further refined by running the beat tracking technique by the DP. Denote the final estimated fetal R peaks by a strictly increasing sequenceB f . The fECG signal could be reconstructed by the nonlocal median algorithm, with L f , R f ∈ N chosen based on the physiological fact and K f ∈ N nearest neighbors chosen by the user. Denote the final reconstructed fECG as x f ∈ R N . In practice, we found that the result is stable when K f ranges from 20 to 60.
[Optional step 3] This optional step takes the physiological constraint into account. If the average η f determined from the rough fECG signal is smaller than η m , the final reconstructed maECG and fECG could be exchanged to respect the physiological constraint that normally the mIHR is slower then fIHR. When both the fetus and mother are healthy, this step could help when the fECG is strong than the maECG, a case which is not commonly seen.
3.2. Evaluation statistics. The R peak detection result is evaluated by beat-to-beat comparisons between the detected beats and the annotation provided by the experts. We follow the suggestion [31] and choose a matching window of 50 ms. Denote T P, FP, T N, and FN to be true positive, false positive, true negative, and false negative. We report the sensitivity (SE) SE = T P/(T P + FN), the positive predictive value (PPV) PPV = T P/(T P + FP), and the F 1 score, which is the harmonic mean of PPV and SE, In addition, the mean absolute error (MAE) of estimated R peak locations is also report. To make the evaluation independent of the detection accuracy, we follow the suggestion in [5] to calculate the MAE only on TP annotations.
To evaluate the fECG morphology recovery, we consider the correlation between the estimated fECG signal and the true fECG signal, over the period ranging from 80 msec before the annotated R peak and 120 msec after the annotated R peak. Again, we evaluate the correlation only on TP annotations.

MATERIAL AND RESULT
For a fair comparison, the parameters for the algorithm are set to be the same for all signals throughout the paper, and the optional step 2 and 3 are implemented, unless otherwise stated. The window h is chosen to be the Hamming window of length 5 second, the up-sampling rate α = 10, the frequency resolution in STFT and de-shape STFT is set to 0.02 Hz, the quefrency resolution is set to 10 msec, γ = 0.3 for the STCT, and υ = 10 −4 % of the root mean square energy of the signal under analysis for the de-shape STFT. In the beat tracking, λ BT is set to 50. In the nonlocal median, we choose K m = K f = 40.

4.1.
Material. We evaluate the proposed algorithm on three databases of aECG signals. The first database is the simulated fECG signal database (fecgsym) [5]. The publicly available simulator generates simultaneously the maECG and fECG signals in 34 channels, which model a number of realistic non-stationary physiological phenomena that affect the morphology and dynamics of the aECG, including different kinds of noise. A total of seven physiological events are introduced in the simulator: (1) Baseline: the baseline abdominal mixture with a constant fIHR and mIHR without noise or events; (2) Case 0: baseline signal contaminated by noise; (3) Case 1: Case 0 is complicated by the fetal movement noise; (4) Case 2: the fIHR and mIHR are time-varying with noise; (5) Case 3: Case 2 is complicated by the uterine contraction noise; (6) Case 4: Case 2 is complicated by ectopic beats in both fECG and maECG; (7) Case 5: twin pregnancy contaminated by noise. The simulator also generates five different levels of additive noise, ranging from 0, 3, 6, 9, to 12 dB. Each physiological event and noise level were simulated independently five rounds to mimic the realistic situation. A generated benchmark is available in https: //physionet.org/physiobank/database/fecgsyndb/ [29], which contains ten subjects, five rounds, and five SNR's for each case. Each simulation is of 5-minute long and is discretized at the sampling rate 250Hz. For each subject, case, round, and SNR, there are one maECG, fECG, two noise realizations, and one extra uterine contraction noise in Case 3 in the database. We sum all these time series together to get the simulated signal for analysis. We test our algorithm on all Cases, all levels of noise, and all five simulations, except Case 5. We consider the twin pregnancy case as an independent project, and the result will be reported in the other work.
The second database is the PhysioNet non-invasive fECG database (adfecgdb), where the aECG signals with the annotation provided by experts is publicly available https:// www.physionet.org/physiobank/database/adfecgdb/ [29]. The annotation is determined from the direct fECG recorded from the fetal scalp lead. There are five pregnant women between 38 to 40 weeks of pregnancy in this database, each has 4 aECG channels and one direct fECG signal. The signal is of 5-minute long, and is digitalized at 1000Hz with 16bit resolution. More details could be found in https://www.physionet.org/ physiobank/database/adfecgdb/.
The third database is the 2013 PhysioNet/Computing in Cardiology Challenge (https: //physionet.org/challenge/2013/#data-sets), abbreviated as CinC2013, which is composed of three sets, learning (training) set A, open test set B, and hidden test set C, where only set A comes with the annotation. We thus used set A for an assessment of our proposed algorithm. There are 75 subjects in set A, and for each subject four aECG channels of length 1-minute long sampled at 1000 Hz are provided. More details about this database could be found in https://physionet.org/challenge/2013/#data-sets.

4.2.
Results of the simulation: fecgsym database. In this simulated database, since we a priori know that the noise could be as big as 0 dB, so we choose a longer window of length 10 s. The other parameters are fixed for all the cases.
Following the recommended report method in [5], Tables 1 and 2 give an overview of the performance of the proposed algorithm. In addition to reporting the 1min result, we also report the whole 5min result. For the 1min (respectively 5min) result, for each subject, noise level, case, and round, we find the channel and the 1-minute subset out of the 5-minute signal (respectively the channel and the whole 5-minute signal) that gives the highest F 1 . For each noise level and case, the median and interquartile (IQR) of those highest F 1 values, and the associated MAE of all subjects and rounds are shown in Tables 1. To better understand the performance of the proposed algorithm, the whole gross statistics with the tenth highest F 1 is reported in Table 2.
From these two tables, we could see that as SNR decreases, the performance decreases, which is consistent with the results of different TS techniques reported in [5, Figure 4]. It is also clear that the overall accuracy of the proposed algorithm is higher than those of TS techniques and adaptive filtering techniques reported in [5, Table 3]. We also notice that the IQR in our report is higher than those of different TS techniques reported in [5, Table 3]. This is caused by the fact that we do not use the ground truth maternal R peak information but estimate it from the aECG signal. 2 We should notice that when SNR decreases, the lost information could not be recovered by only one channel signal. Clearly, in Case 3 where the signal is contaminated by the uterine contraction noise, the results on the 5-minute long signal is much lower. This comes from the fact that the energy of the uterine contraction is much larger than both the maECG and fECG, and almost no information could be extracted when the uterine contraction happens. Note that except Case 4, the MAE outperforms the reported results in [5, Table 3]. This indicates the strength of beat tracking and nonlocal median. In Case 4, although the MAE is larger than the other cases, it is still on the same level of that reported in [5, Table 3]. The fECG morphology estimation results are shown in Table 3 and Figure 2. The high correlation between the estimated fECG and the ground truth indicates that the nonlocal median algorithm leads to an estimated fECG morphology with low distortion. Note that in this evaluation, we only evaluate the correlation on the correctly detected fECG beats (true positive fECG beats). This suggests that if we could accurately estimate the fetal R peaks (the higher F 1 score), then we could have an accurate fECG reconstruction.
To look deeper into the algorithm and its performance, we take subject 1, round 1, case 4, and channel 21 into account, and show the result without noise in Figure 2. In this example, over the 5-minute, the F 1 is 1 and the MAE is 0.78 msec. In addition to the de-shape STFT of the maECG signal, the estimated R peaks by the beat tracking, the decomposed maternal ECG, the rough fECG estimate x f ,0 and its TFR, and the final fECG estimate are shown. We see that by the de-shape STFT, the information of non-sinusoidal oscillation; that is, the time-varying wave-shape function, and the IHR information are decoupled, and only the IHR information is shown in the TF representation. Note that we could see both the mIHR and fIHR in the TFR, and as expected, fIHR has a weaker intensity than mIHR, meaning that the energy of fECG is smaller. The intensity level could be seen in the colorbar. For a comparison, we could see that in the STFT of aECG, both the mIHR and fIHR could be seen, while the fIHR is much weaker compared with that in the de-shape STFT. Furthermore, the multiples of the maternal fundamental component could mask both the mIHR and fIHR information and even interfere with each other. To show how the fECG could be reconstructed, the estimated fECG and the ground truth fECG signal are put side by side for a comparison. On the other hand, we could see that the nonlocal nature of the nonlocal median algorithm does help us to recover ECG morphology, even the ectopic beats, and the median and IQR of the correlation of all TP beats are 0.997 and 0.004.
To show the result when noise exists, we take another signal, subject 3, round 1, case 4, and channel 23 into account, and with SNR 6dB, as an example, and show the result in Figure 3. We choose this signal as an example since it has a smaller fECG amplitude. As can be clearly seen, even when the noise is 6dB, the de-shape STFT still gives a reasonable TFR with the IHR information, although there are several "speckles" in the background, which come from the noise. In this example, since fECG is smaller compared with the maECG, the fIHR could not be clearly seen in the de-shape STFT of the aECG. However, it could be seen clearly in the de-shape STFT of the rough fECG estimate. The intensity level could be seen in the colorbar. Also note that since the fECG is smaller and the noise is big, and no denoise technique is combined into the algorithm, the de-shape STFT of the rough fECG estimate has a "scattered" background. For the morphology reconstruction, due to the large noise, the nonlocal median failed to recover two ectopic beats, as are indicated in the red arrows. Note that although the nonlocal nature of the nonlocal median has the power to handle ectopic beats, it is not designed for this purpose. To have a better recovery of the ectopic beats, more features specifically designed for ectopic beats should be taken into account. In this case, the F 1 is 0.996, the MAE is 3.476 msec, and the median and IQR of correlations of all TP beats are 0.968 and 0.034. We could see that the nonlocal median provides a convincing potential in recovering the fECG morphology, specially when noise exists.

4.3.
Results of the first real database: adfecgdb database. Table 4 shows the F 1 , MAE, PPV and SE of all channels and all subjects. Notice that in the r10 record, the direct fECG measurement was lost between 187 and 191 s, and between 203 and 211 s, so these two segments were neglected in the evaluation. To avoid the boundary effect introduced by the window function, the first and last 0.5 second in every recording are not evaluated neither. In terms of F 1 , compared with the state-of-art result reported in, for example, [12], our result is overall better. The MAE, which is less reported in the literature, is as small as 10 msec, which indicates the potential to carry out the fetal HRV analysis from the singlelead aECG, although this topic is out of the scope of this paper. However, as compared with the simulated database, the MAE is larger. Note that this larger MAE is reasonable, since in this database, we use the R peaks of the direct fECG as our ground truth, but the direct fECG has a different projection direction compared with the fECG recorded from the maternal abdomen. This difference leads to the slightly larger MAE.
To further explore the proposed algorithm, in Figure 4, we show the result of STFT and de-shape STFT of the third channel of the case r07, denoted as x 11 , as well as the the result of STFT and de-shape STFT of the rough fECG estimate from x 11 , denoted as x 11, f 0 (the signal used in the flowchart in Figure 1 is the channel 1 of the case r01). Clearly, in the STFT of x 11 , we not only see the fundamental IF of maECG around 1.2 Hz, but also its multiples, due to the nature of non-sinusoidal oscillation of the ECG signal. We could also see a relatively vague component around 2 Hz in STFT, which turns out to be the fIHR. On the other hand, after the de-shape process, in the de-shape STFT of x 11 , only two dominant curves associated with the mIHR and fIHR are left.
An example of the estimated fECG waveform, x 11, f 0 , is shown in Figure 5. The R peaks are clearly well reconstructed and match those in the direct fECG signal recorded from the fetal scalp. By comparing the rough fECG estimate and the final fECG estimate, we could see the effectiveness of nonlocal median. Furthermore, some fiducial points could be observed, as is shown in Figure 6, which is the zoomed in of Figure 5. Although not all critical fiducial points could be extracted, this result indicates the potential of studying the morphology of fECG, particularly taking into account the fact that we only count on a single-lead aECG signal. We mention that in some cases the P wave and T wave could be reconstructed, but in general they are buried in the noise, so we do not consider it as an achievement of the proposed algorithm, and more work is needed to recover these morphological features.

4.4.
Results of the second real database: CinC2013 database. In this database, since the signal is of only 1-minute long, we take K m = K f = 10 to enhance the local nature of the nonlocal median algorithm. In practice, the result is slightly worse if K m = K f = 40.
Here we follow the report suggested in [8] to report the mean in addition to the median of all subjects. For each subject, we choose the channel with the highest F 1 score as the channel for this subject, and report the gross statistics of the obtained F 1 , PPV, SE, and MAE over 75 subjects. Over 75 subjects, the mean and standard deviation of the F 1 score (respectively PPV and SE) are 86.37 and 22.9% (respectively 85.77% and 23.42%, and 87.5% and 22.05%), and the mean and standard deviation of MAE are 6.12 msec and 5.56 msec. 3 Our F 1 result is compatible with the reported results in [28,43,24,8,6], while the MAE is better. On the other hand, over 75 subjects, the median and IQR of the F 1 score (respectively PPV and SE) are 98.28% and 14% (respectively 98.25% and 14.72%, and 98.59% and 22.05%), and the median and IQR of MAE are 3.81 msec and 5.74 msec.
The discrepancy between the mean and median indicates the existence of outliers in the database. We thus take a closer look into the database. We found that the fECG signal could be hardly visualized in 10 subjects in the database, including a27, a32, a43, a50, a59, a60, a63, a64, a68, and a75, even when the signal is clean, and these cases are considered difficult for our algorithm. If we remove these cases, the mean and standard deviation of the F 1 score (respectively PPV and SE) become 94.54% and 11.63% (respectively 93.87% and 13.05%, and 95.78% and 8.79%), and the mean and standard deviation of MAE become 4.35 msec and 3.04 msec. On the other hand, the median and IQR of the F 1 score (respectively PPV and SE) become 98.84% and 4.91% (respectively 98.53% and 6%, and 99.21% and 4.16%), and the mean and standard deviation of MAE become 3.35 msec and 4.33 msec.
To have a deeper look into the difficulty, we show one example, a59, which is considered difficult case for our algorithm, in Figure 7. In Figure 7, the second channel of a59 is shown for an illustration. It is clear that the signal is quite clean, and it is not easy to identify if the fECG exists, even with the help of the provided annotation. Not surprisingly, the fetal R peaks are all detected incorrectly. By a direct visual inspection, we could see that those locations that are erratically identified as fetal R peaks are nothing but the residue coming from the incomplete maECG removal. This fact could also be observed in the de-shape STFT of the rough fECG estimate -the only "dominant component" in the de-shape STFT coincides with that of the aECG, as is indicated by the red arrow. We mention that the other three channels all share the same result -the fECG is too small to be even sensed from this relatively clean aECG signal. However, if we take the VCG notion into account, then it is possible to enhance the result. Precisely, by linearly combining different channels, we have a chance to obtain a stronger fECG signal relative to the maECG signal, and hence the result could be better. This idea is shown in 8, where the difference between channel 2 and channel 3 is analyzed. Clearly, we could see that now the fetal R peaks could be almost perfectly recovered. In this specific case, the F 1 is 99.3% and the MAE is 0.993 msec. We mention that if we do the same linear combination trick between channel 1 and channel 3 (respectively channel 2 and channel 3, channel 2 and channel 3, channel 1 and channel 2, channel 1 and channel 2, channel 2 and channel 3, channel 2 and channel 3, channel 2 and channel 3, and channel 1 and channel 2), the F 1 's of a27 (respectively a32, a43, a50, a60, a63, a64, a68, and a75) become 79.3% (respectively 100%, 100%, 97.9%, 84.7%, 91%, 98.5%, 91.4%, and 86.2%). While this is a naive way to handle the problem and it works well in this preliminary analysis, however, we need at least two channels. Since we focus on the single-lead aECG analysis in this paper, this direction will be explored in the other work. We mention that this idea is also considered in [6].

DISCUSSION AND FUTURE WORK
In this section, we discuss the difficulty of the fECG extraction problem from the signal processing viewpoint, different findings of the proposed algorithm, and several possible applications and future works.

General technical difficulty.
In the analysis of multi-component oscillatory signals, there are two chicken-and-egg problems of fundamental importance. The first is the detection problem; that is, how to determine the number of components and how to find the fundamental frequency or equivalently the fundamental period of each component? The second one is the separation problem; that is, how to separate all components from a recorded signal? These two problems coexist in many kinds of data, ranging from physiological signals to polyphonic music signals, where each component in the mixture contains information different from others.
Previously, these two problems are usually discussed separately, probably because only discussing one of them is challenging enough. However, recently some research works start to consider these two problems as a single one by viewing the separation problem as an extension of the detection problem. For example, we could simultaneously estimate the IF of each component, and then extract the wave-shape function as well as each component.
The methods could be classified into two classes, iterative or joint. The iterative approach extracts the most prominent IF/component in each iteration, until no additional IF/component can be found. Although iterative models are usually computationally inexpensive, they have a main drawback: iterative models tend to accumulate errors at each iteration step if the feature representation is not robust enough. To handle this limitation, we could consider the "joint" approach. Vast majority of recent approaches in multi-pitch estimation and source separation in music now falls within the "joint" category [10], and more and more studies on source separation started to consider pitch information to improve the source separation algorithms [14,34]. Note that although it is intuitive to utilize the IF information to handle the detection and separation problem, this kind of approach has been less studied until recent years, probably because the task of finding IF is by no means easy, especially when there are multiple components. Compared with the iterative method, the joint methods lead to more accurate estimates but with more involved mathematical tool and increased computational cost.
Our proposed algorithm falls in the "iterative" category -estimate and remove the maternal component first, and get the fIHR and fECG from the left. We do have the accumulated error issue when we estimate the fECG, and we count on the median filter to alleviate this error. One natural question is to ask if it is possible to generate a single-lead fECG algorithm in the joint category, in order to alleviate this problem, and the answer is positive. Precisely, the mIHR and fIHR could be estimated simultaneously by the de-shape STFT, as is shown in Figure 4. The estimated R peaks of each component could then be applied to estimate the maECG and fECG by the nonlocal median algorithm. In brief, we could consider an algorithm falling in the joint category, which is composed of two steps: (1) jointly estimate the mIHR and the fIHR by the de-shape STFT; that is, carry out Optional step 1 in the algorithm; (2) jointly approximate the maECG and fECG signal by the nonlocal mean. However, a preliminary study showed that without a more sophisticated curve extraction algorithm, this approach does not outperform what we propose in the current paper. Its potential will be explored in the future work.

Several theoretical and algorithmic topics.
We should discuss more about each step in the proposed algorithm. First of all, notice that the proposed algorithm is specifically designed to handle a nonstationary signal, which is commonly encountered in physiology. The de-shape STFT respects the local information hidden in a nonstationary time series, and decouple the non-sinusoidal oscillatory pattern from the IF; the nonlocal mean, while it is not local, takes the nonlinear relationship between the phase and the oscillation (the nonlinear relationship between the RR interval and QT interval) into account. Thus, the proposed algorithm could be applied to analyze a long time series without modification, and the hard truncation of the long time series into pieces is not needed.
We have seen the potential of extracting more information hidden inside the single-lead aECG signal by the ANH model and the de-shape STFT. In the past few years, different ideas in the TF analysis field have been proposed, regarding the model and the algorithm, and it is possible to incorporate them into the current algorithm to achieve a better practical result. For example, the Blaschke decomposition and unwinding based on the analytic signal analysis technique [21], the synchrosqueezing transform, or the concentration of frequency and time (ConceFT) [22]. However, to simplify the discussion, we do not carry out this combination in this paper. While de-shape STFT works well, we cannot ignore a major limitation in the curve extraction step for the IHR estimation. Precisely, when the energy of the fECG signal is greater then that of the maECG, then the curve extraction algorithm could easily go astray; that is, when the fECG has an almost equal or larger energy than the maECG, we could see two dominant curves, which may or may not intersect, and the curve extraction algorithm could get confused. While we could apply the physiological constraint to distinguish which curve belongs to the maECG, for example, the fIHR is in general faster than mIHR (Optional step 3), it is not universal, and will limit the method to normal subjects. We thus need more tools to handle this issue.
Next, we mention that the ability to obtain an accurate IHR estimation is the key step to an accurate R peak location estimation by the beat tracking algorithm. The technical limitation in obtaining a good IF information might explain why the intuitive and powerful beat tracking technique is not widely applied up to now, although it was introduced in the music signal analysis field long time ago [26]. In sum, the combination of the de-shape STFT, or any other technique providing an accurate IF estimation, is essential. Also note that this combination could offer an alternative way to estimate the R peaks or other fiducial points, and even to other signal processing problems.
Furthermore, to the best of our knowledge, this work is the first one combining the nonlocal median and the nonlinear manifold model to analyze the biomedical time-series analysis, and particularly reconstructing the time-varying wave-shape function. Taking the nonlinear manifold structure to analyze a time series is certainly not a new idea [53,37,40]. However, in the past, the focus was on decoupling maECG and fECG by the locally linear projection on the sequential beats. In this work, the nonlinear geometric structure is further explored to apply the nonlocal median. In general, we could apply other manifold learning techniques to further improve the algorithm. From a data analysis viewpoint, it is well known that a manifold structure might still be too restrictive to be the best model for this dataset, while it could be low dimensional and nonlinear with nontrivial topological structures, and could serve as a good mathematical framework to understand different algorithms. Thus, finding a more flexible model to include more physiological facts is a critical issue in the future study. While the L 1 norm is associated with the nonlocal median algorithm, to enhance the sparsity feature, it is possible to consider the L p norm, where 0 < p < 1, which has been applied in the recently proposed nonlocal patch regression for image denoising [16].
We have extensively extracted available information hidden in the single-lead aECG signal. When there are more than one channel, it could be beneficial to combine results from different channels, or to integrate the proposed algorithm with existing multiplelead algorithms to further improve the accuracy. The benefit of doing so has been shown with the case a59 in the CinC2013 database in Figure 8. Another direct benefit of using multiple-lead algorithms is guiding the mIHR estimation by the curve extraction, when the fECG has a stronger energy than the maECG. In practice, we could also expect to combine other kinds of signals, like the contact photoplethysmogram signal commonly equipped in modern wearable devices, to guide the mIHR estimation. The result of this kind of combination and the analysis of multiple channel signals will be reported in the future work.

Several practical topics.
In this study, we do not carry out any optimization to choose the best parameters, but just choose them in an ad hoc fashion. For example, due to the inevitable Heisenberg uncertainty principal, in general the window should be long enough to capture enough spectrum information, but it could not be too long, otherwise the local information could be missed. In practice we found that a window that could cover about 5-10 oscillations is a good choice. On the other hand, when the signal is noisy, we could get more stable results if the window is longer. Based on the physiological constraint and the discussion, we thus choose a 10 sec long window for the very noisy simulated signal, and a 5 sec window for the less noisy real databases. We mention that the algorithm in general is stable to the choice of parameters, and the performance of the algorithm could be further improved by running a systematic parameter optimization for a specific mission.
The proposed fECG extraction algorithm is just one among many successful algorithms. Each algorithm has its own strength and weakness, and it is generally believed that there might not exist a single algorithm that could handle all different situations and different signals. Thus, it would be of great interest to consider the possibility of combining the proposed algorithm with other methods, like the combination proposed in [8]. More practically, while in the real life the noise could be highly nonstationary, we may want to combine the notion of signal quality index (SQI) to help guide us in selecting the "good" signal for analysis [7,47,27]. In general, if the noise is really too big and bypass the information limitation, then there is no hope to recover anything. However, if the noise is not that big, then some denoising techniques could certainly help. In this work, to simplify the discussion, we do not consider any denoising technique before running de-shape STFT or beat tracking. However, it could be of great help if we could incorporate a good de-noise scheme into the algorithm. All these interesting practical issues will be explored in the near future.

Several clinical topics.
In clinics, the fECG signal could provide at least two different kinds of physiological information -the HRV and the electrophysiological dynamics. For the HRV analysis, the most important ingredient is having an accurate R peak detection algorithm. One of the main strength of the propose algorithm is providing the state-of-theart MAE in the field. For example, in the real database, the averaged MAE is about 5 msec, which is roughly equivalent to the R peak information from an fECG signal sampled at about 200Hz, and its clinical value does deserve a further evaluation. However, note that when we evaluate the MAE, we focus only on the TP beats, which means that we could only get a good HRV analysis when F 1 is high enough. See Figure 9 for an illustration of the relationship between the HRV analysis and the F 1 . This figure is generated in the following way. For each case in CinC2013 (75 in total), we evaluate the root mean square error (RMSE) to quantify how accurate we could recover the fIHR for the HRV analysis. Based on the Task Force [61], we get the fIHR's from the estimated fetal R peaks and from the annotated fetal R peaks, and RMSE between these two fIHR is evaluated. Note that we take all estimated fetal R peaks to simulate the real scenario. We could see that when F 1 is slightly worse than 95%, the RMSE is as high as 0.1 beats/sec. This preliminary result emphasizes the importance of getting an accurate F 1 . An extensive study of this important topic will be reported in the future work.
For the fECG morphology reconstruction, although the result is convincing, note that we could not infer too much electrophysiological information from the single-lead fECG signal, even if the reconstruction is perfect. Indeed, due to the variation of relative cardiac axes between the mother and fetus, the projection direction of fECG is in general unknown, even if the abdominal lead system on the abdominal surface is standardized. To explore this direction, we need to design or choose the optimal single abdominal lead location under the complicated time-varying dynamics, which is adaptive to the mother, or we may take multiple leads and proceed with an adaptive way to reconstruct the VCG signal.
Uterine contraction is in general a difficult problem and big challenge in the fECG signal analysis, since it behaves like a huge noise in the aECG signal. In general, to handle such a huge noise, some a priori knowledge of the noise is needed. While the current method could provide a reasonable result in Case 3 of the simulated database, it does not have the ability to handle the uterine contraction, and that is why we always get an F 1 less then 80% in the 5 min result. While it is out of the scope of this paper, we mention that one possible approach is to incorporate the physiological behaviors underlying cardiac signals, a realistic model of an individual fetus' ECG, or the statistical behavior of the noise associated with the uterine contraction to improve the result [19,57,59].
It is not a reasonable assumption that the mother or the fetus are both healthy in the real life. Sometimes we might encounter subjects with a problematic heart. In Case 4 of the simulated database, the situation when the fetus and mother both have frequent ectopic beats, like ventricular pre-contraction (VPC), is considered. We see that the nonlocal median algorithm gives us a reasonable cardiac activity estimation. Indeed, the distance between an ectopic beat and a normal beat is in general larger than the distance between two ectopic beats and two normal betas. Therefore, the nonlocal nature of the nonlocal median algorithm could accurately estimate the cardiac activity, since VPC and normal beats are clustered into two groups. Note that if VPC is polymorphic and/or the VPC number is small, the problem becomes more challenging and a different approach is needed.
The twin pregnancy problem was relatively rarely discussed, except in [62,50]. In Case 5 of the simulated database, although we do not specifically study in this paper, we mention that the proposed algorithm has the potential to handle even multiple pregnancy problem. Indeed, the mIHR and fIHR's of different fetuses could be simultaneously revealed in the de-shape STFT. However, it is still challenging to utilize this single channel information at this stage, since there is no a priori information about the relationship between the IHR's and signal strengths of those fetuses. This limits the curve extraction step in the de-shape STFT, as the IHR's and energy of two fetus might be similar. Thus, although we could see all the fIHR's, to extract the fIHR from the TF representation based on the current curve extraction algorithm is challenging, and we need a more sophisticated curve extraction algorithm to do it. The study of the above-mentioned clinical topics will be explored in the future work. 5.5. Limitation. The discussion could not be complete without mentioning the limitation. While the strength of the proposed algorithm is confirmed on one simulation and two real databases, a larger scale clinical study is needed to further confirm the usefulness of the proposed algorithm since the size of the publicly available databases, adfecgdb and CinC2013, is small. Second, although fecgsyn database provides a universal comparison platform with many interesting examples, the model at this stage is still oversimplified, and that might explain the high accuracy of our proposed algorithm. Many other limitations have been well discussed in [5]. A better "ground truth" should be considered for the evaluation purpose. For example, a well established sheep model [25] could provide a gold standard test bed for the algorithm.

CONCLUSION
In this paper, we propose a novel fECG extraction algorithm depending mainly on a single-lead aECG based on a nonlinear TF analysis technique, de-shape STFT, and the nonlocal median. In addition to providing the theoretical model and mathematical guarantee, the algorithm is evaluated on a simulated fECG database and tested on two real databases with experts' annotation. The main novelty in our algorithm is an extensive utilization of hidden information inside the aECG signal, that is, both the frequency and energy information and the nonlinear relationship between consecutive cardiac activities. In addition to solving clinical problems, we mention that the combination of the de-shape STFT and nonlocal median has a potential to deal with more general multi-component oscillatory signals in other fields.

ACKNOWLEDGEMENT
Hau-tieng Wu's research is partially supported by Sloan Research Fellow FR-2015-65363. Part of this work was done during Hau-tieng Wu's visit to National Center for Theoretical Sciences, Taiwan, and he would like to thank NCTS for its hospitality. Hautieng Wu acknowledges Professor Ingrid Daubechies for the continuous discussion about various topics and Professor Martin Frasch for the discussion about clinical needs. TABLE 1. Results of fetal R peaks estimation -median F 1 and MAE and their IQRs for different cases and noise levels among 10 subjects and 5 rounds. The 1min result is reported in the following way. For each subject, noise level, case, and round, we take the channel and the 1-minute subset out of the 5-minute signal that leads to the highest F 1 to report the result. Then, for each noise level and each case, the median and IQR of all subjects and rounds are shown as the final result, as is suggested in [5]. For the 5min result, for each subject, noise level, case, and round, we take the channel that leads to the highest F 1 , and then for each noise level and each case, we report the median and IQR of all subjects and rounds. Results of fetal R peaks estimation for the tenth best channelmedian F 1 and MAE and their IQRs for different cases and noise levels among 10 subjects and 5 rounds. The 1min result is reported in the following way. For each subject, noise level, case, and round, we take the channel and the 1-minute subset out of the 5-minute signal that leads to the tenth highest F 1 to report the result. Then, for each noise level and each case, the median and IQR of all subjects and rounds are shown as the final result, as is suggested in [5]. For the 5min result, for each subject, noise level, case, and round, we take the channel that leads to the tenth highest F 1 , and then for each noise level and each case, we report the median and IQR of all subjects and rounds. Results of fECG waveform estimation over the 5-minute signal -median correlation and its IQR for different cases and noise levels among 10 subjects and 5 rounds. For each subject, noise level, case, and round, we take the channel that leads to the highest F 1 (and the thenth highest F 1 ), and then for each noise level and each case, we report the median and IQR of all subjects and rounds.   the clean maECG is shown in light gray (shifted up by 5 units) superimposed with the estimated maECG signal in blue on the second row; the rough fetal ECG is shown in dark gray (shifted up by 2.5 units) with the detected R peaks in round red circles on the third row; the clean fECG is shown in light gray superimposed with the estimated fECG signal in red on the bottom row. In the middle subplot, the de-shape STFT of the maECG is shown on the left and the de-shape STFT of the rough fECG is shown on the right. In the bottom subplot, the STFT of the maECG is shown on the left and the STFT of the rough fECG is shown on the right. Note that the discrepancy between the clean maECG and the estimated maECG comes from the median filter, which in general is needed to remove the baseline wandering. the clean fECG is shown in light gray superimposed with the estimated fECG signal in red on the bottom row. The two red arrows indicate two ectopic beats that are not recovered by the nonlocal median algorithm.
In the middle subplot, the de-shape STFT of the maECG is shown on the left and the de-shape STFT of the rough fECG is shown on the right. In the bottom subplot, the STFT of the maECG is shown on the left and the STFT of the rough fECG is shown on the right. Note that the discrepancy between the clean maECG and the estimated maECG comes from the median filter, which in general is needed to remove the baseline wandering.    In the top subplot, all relevant signals are shown together for a visual comparison. The clean maECG signal is shown in black (shifted up by 1600 units) with the detected maternal R peaks in round red circles and the annotated fetal R peaks in blue diamonds on the top row; the estimated maECG signal is shown in blue (shifted up by 600 units) on the second row; the rough fetal ECG is shown in dark gray (shifted up by 300 units) with the detected R peaks in red upper triangles and the annotated R peaks in blue diamonds on the third row; the fECG estimation is shown in red on the bottom row. It is clear that the fetal R peaks are all detected incorrectly. In the bottom subplot, the de-shape STFT of the maECG is shown on the left and the de-shape STFT of the rough fECG is shown on the right. The red arrow indicated the "dominant curve" in the de-shape STFT, which comes from the incomplete removal of the maECG signal. In the bottom subplot, the de-shape STFT of the maECG is shown on the left and the de-shape STFT of the rough fECG is shown on the right. Clearly, the fetal R peaks is perfectly recovered.