How Nonlinear-Type Time-Frequency Analysis Can Help in Sensing Instantaneous Heart Rate and Instantaneous Respiratory Rate from Photoplethysmography in a Reliable Way

Cicone, Antonio; Wu, Hau-Tieng

doi:10.3389/fphys.2017.00701

METHODS article

Front. Physiol., 22 September 2017

Sec. Computational Physiology and Medicine

Volume 8 - 2017 | https://doi.org/10.3389/fphys.2017.00701

This article is part of the Research TopicHemodynamic Parameters and Cardiovascular ChangesView all 7 articles

How Nonlinear-Type Time-Frequency Analysis Can Help in Sensing Instantaneous Heart Rate and Instantaneous Respiratory Rate from Photoplethysmography in a Reliable Way

Antonio Cicone¹

Hau-Tieng Wu^2,3^*

¹Department of Information Engineering, Computer Science and Mathematics, Universitá degli Studi dell'Aquila, L'Aquila, Italy
²Department of Mathematics and Statistical Science, Duke University, Durham, NC, United States
³Mathematics Division, National Center for Theoretical Sciences, Taipei, Taiwan

Despite the population of the noninvasive, economic, comfortable, and easy-to-install photoplethysmography (PPG), it is still lacking a mathematically rigorous and stable algorithm which is able to simultaneously extract from a single-channel PPG signal the instantaneous heart rate (IHR) and the instantaneous respiratory rate (IRR). In this paper, a novel algorithm called deppG is provided to tackle this challenge. deppG is composed of two theoretically solid nonlinear-type time-frequency analyses techniques, the de-shape short time Fourier transform and the synchrosqueezing transform, which allows us to extract the instantaneous physiological information from the PPG signal in a reliable way. To test its performance, in addition to validating the algorithm by a simulated signal and discussing the meaning of “instantaneous,” the algorithm is applied to two publicly available batch databases, the Capnobase and the ICASSP 2015 signal processing cup. The former contains PPG signals relative to spontaneous or controlled breathing in static patients, and the latter is made up of PPG signals collected from subjects doing intense physical activities. The accuracies of the estimated IHR and IRR are compared with the ones obtained by other methods, and represent the state-of-the-art in this field of research. The results suggest the potential of deppG to extract instantaneous physiological information from a signal acquired from widely available wearable devices, even when a subject carries out intense physical activities.

1. Introduction

Since being introduced in 1937 and its popularization in 1975 (Aoyagi et al., 1975; Nilsson, 2013), photoplethysmography (PPG) has become an essential optical technique in healthcare and its mechanism has been extensively studied (Mannheimer, 2007; Nilsson, 2013). It is noninvasive, economical, comfortable and easy-to-install. In recent years, due to the advances in sensor technologies, different types of PPG signals have also become available via non-contact sensors (Davila et al., 2016; McDuff et al., 2016). Furthermore, the PPG has become a standard equipped sensor in diverse mobile devices for healthcare, an important component in the internet of things (Swan, 2012), and in non-standard applications like monitoring the hemodynamics under the hyper- or microgravity environment (Blanik et al., 2007), in music therapy (Shin et al., 2014) and many others.

The PPG contains extensive physiological dynamics information, such as peripheral oxygen saturation, autonomic nervous system status, cardiac and respiratory dynamics and hypovolemic status (Nilsson, 2013). In recent decades, based on PPG plain statistics, several indices have been proposed and extensively applied for clinical needs. Examples include but are not limited to (Nilsson, 2013) the heart rate and respiratory rate monitoring, pleth variability index for the fluid status assessment, surgical pleth index for stress evaluation and sleep apnea detection.

In more recent years, different scientific communities have tackled the problem of also learning hidden physiological information from a PPG. In particular, researchers have focused their attention on fine physiological dynamics like heart rate variability (HRV) and respiratory rate variability (RRV) that traditionally were studied directly from electrocardiograms or breathing flow signals. The reason behind this effort is that reliably and accurately extracting fine physiological information from a PPG signal would open the door to the development and popularization of a new generation of medical care based on leisure time equipment.

However, analyzing the HRV and RRV from the PPG is not an easy task, particularly when we only have one channel PPG sensor. The main difficulty in achieving this research goal is the need to extract the heart rate and the respiratory rate from the PPG in the instantaneous sense, instead of simply getting average values over a time window. The identification of the instantaneous heart rate (IHR) and the instantaneous respiratory rate (IRR) from a PPG recording is challenging for standard signal processing tools due to the following two factors. The first challenge comes from the time-variability in the heart and respiratory rate, and the second challenge is a result of the non-sinusoidal PPG oscillation. The time-variability in the heart and respiratory rate broadens the spectrum, and the non-sinusoidal oscillation inevitably mixes up the spectral information for the cardiac and respiratory activity (Lin et al., 2016). The broadened and mixed up spectrum prohibits us from applying the standard signal processing techniques. The problem is even more challenging, since the signal is often contaminated by nonstationary uncertainties, like noise and motion artifacts, particularly when PPG signals are recorded during every day activities.

In the past few years, several methods have been proposed to address this challenge. For the IHR, methods include time-frequency (TF) analyses (Gil et al., 2010; Mullan et al., 2015; Wu et al., 2016), adaptive filtering (Yousefi et al., 2014; Khan et al., 2015; Murthy et al., 2015; Schack et al., 2015; Mashhadi et al., 2016), Kalman filter (Frigo et al., 2015), sparse spectrum reconstruction (Zhang, 2015), blind source separation (Wedekind et al., 2015), a Bayesian approach (D'souza et al., 2015; Sun and Zhang, 2015), correntropy spectral density (CSD) (Garde et al., 2014), empirical mode decomposition (EMD) (Zhang et al., 2015), model fitting (Wadehn et al., 2015), deep learning (Jindal, 2016), fusion approaches (Temko, 2015; Zhu S. et al., 2015), etc. For the IRR, efforts include TF analysis (Chon et al., 2009; Orini et al., 2011; Dehkordi et al., 2015), sparse signal reconstruction (Zong and Jafari, 2015; Zhang and Ding, 2016), neural network (Johansson, 2003), modified multi-scale principal component analysis (Madhav et al., 2013), independent component analysis (Zhou et al., 2006), time-varying autoregressive regression (Lee and Chon, 2010b,a), fusion approaches (Karlen et al., 2013; Cernat et al., 2015), pulse-width variability (Lazaro et al., 2013; Cernat et al., 2014), CSD (Pelaez-Coca et al., 2013; Garde et al., 2014), EMD (Garde et al., 2013), a Bayesian approach (Pimentel et al., 2015; Zhu T. et al., 2015), etc. While the above algorithms focus on either IHR or IRR, only a few ad-hoc algorithms are considered to extract simultaneously the IHR and IRR, like (Garde et al., 2013, 2014). Except for a few algorithms, most of the above methods are based on multiple-channel signals. For example, the method based on the pulse wave transit time (Johansson et al., 2006) needs at least one PPG channel and one ECG channel. At least two PPG channels are needed in the adaptive filtering and blind source separation method.

In this work, we propose a new signal processing technique with deep mathematical roots: the deppG algorithm, which is based on a nonlinear masking technique. This method allows the extraction of IHR and IRR simultaneously from a single channel PPG signal, even in presence of high level of noise and motion artifacts. We validate it on a semi-real simulated PPG database, and discuss the meaning of “instantaneous.” Then apply deppG to two publicly available databases, CapnoBase benchmark database (http://www.capnobase.org) and ICASSP 2015 signal processing cup (https://sites.google.com/site/researchbyzhang/ieeespcup2015), and compare the results with the ones obtained by other methods. The proposed deppG algorithm is composed of two steps. In the first one, given a recorded PPG signal, we compute its spectrogram using the short time Fourier transform (STFT). In order to accurately and robustly extract the IHR and IRR information from the PPG signal, we sharpen the spectrogram both by applying a novel data adaptive nonlinear mask (Lin et al., 2016), and by taking into account the phase information of the STFT of the recorded PPG (Daubechies et al., 2011; Chen et al., 2014). We call the resulting information the de-shaped spectrogram. Thanks to this mask, the instantaneous physiological information contained in the PPG is preserved and enhanced in the de-shape spectrogram at the same time. We point out that the novel nonlinear mask is data adaptive in the sense that it is determined directly from the recorded PPG. Second, we apply a curve extraction technique to derive the IHR and IRR curves from the de-shaped spectrogram. The deppG algorithm structure is illustrated in Figure 1.

FIGURE 1

Figure 1. The flow chart of the proposed algorithm, deppG, to extract the instantaneous heart rate (IHR) and instantaneous respiratory rate (IRR) from the recorded PPG signal. A typical recorded PPG signal lasting for 20 s is shown on the left. The short time Fourier transform (STFT), and hence the spectrogram, of the input PPG signal are then evaluated. The intensity of the spectrogram at a point (t, ξ) in the time-frequency plane indicates how strong the signal oscillates at time t and frequency ξ. The dark curve around 1.6 Hz represents the IHR, while the gray curve around 3.2 Hz (and 4.8 Hz, 6.4 Hz, etc. The frequency axis above 4 Hz is not shown) comes from the non-sinusoidal oscillation of the cardiac activity. Similarly, the dark curve around 0.3 Hz represents the IRR, while the gray curve around 0.6 Hz comes from the non-sinusoidal oscillation of the respiratory activity. With the STFT and the spectrogram of the PPG signal, the nonlinear mask is then designed from the spectrogram and the phase function is determined from the STFT. The intensity of the phase function at a point (t, ξ) in the time-frequency plane indicates the angle of the complex value of the STFT at time t and frequency ξ, which ranges from 0 to 2π. By applying the nonlinear mask and the phase function of the STFT to the spectrogram, the spectrogram is improved and we obtain the de-shaped spectrogram. The darker curve around 1.6 Hz represents the IHR and the lighter curve around 0.3 Hz represents the IRR. The curves corresponding to the IHR and IRR are extracted from the de-shaped spectrogram, which are shown as the red and blue curves respectively superimposed on the de-shaped spectrogram.

The rest of the paper is organized as follows. A detailed description of the proposed deppG algorithm, including the numerical implementation for the reproducibility purpose, is given in Section 2, together with a new non-harmonic model which describes a generic PPG signal. In Section 3, deppG is validated on a simulated PPG database, a discussion of the meaning of “instantaneous” is provided, and we apply deppG to the CapnoBase benchmark database and ICASSP 2015 signal processing cup datasets, and compare results with the ones obtained by other techniques proposed so far in the literature. We conclude this paper with a discussion section. In the online Supplementary Information (SI), we report extra information regarding the results obtained applying this technique to the CapnoBase and ICASSP database.

2. Methods

2.1. Mathematical Model

We believe that to properly analyze a PPG signal, it is fundamental to first develop a suitable model able to replicate the main features of such a signal. With this aim, we start by making an important observation regarding the physiological dynamics we want to model. The cardiac dynamics in a PPG signal could be well understood by taking the Lambert-Beer law into account (Mannheimer, 2007). On the other hand, there are several sources of the respiratory dynamics in the PPG, for example, the respiratory-induced intensity variations (RIIV) and the pulse-wave transit time (Mannheimer, 2007; Nilsson, 2013). These dynamics are oscillatory in nature and inherited from the oscillatory central drivers of the cardiac and respiratory activities. Since writing down a thoughtful differential equation for the PPG might not be feasible due to such complicated physiological dynamics, we choose to model the PPG by a statistics-based phenomenological model that mainly describes the empirical observations of the PPG.

As the first idea for a model, we would be tempted to use a harmonic model. However, the PPG cannot be properly modeled by a harmonic model, like sine or cosine functions, due to at least three limitations. First, in a harmonic model, the frequency, or the period of each oscillation, is assumed to be fixed. This limits the ability of the model to capture the irregular oscillation of the PPG. Instead, the notion of time-varying frequency, or instantaneous frequency, is needed. As the name suggests, the time-varying frequency captures the fact that the PPG oscillatory periods vary over time. Physiologically, the time-varying frequency of the cardiac dynamics captures the IHR, and the time-varying frequency of the respiratory dynamics captures the IRR. Second, the amplitude is assumed to be fixed in a harmonic model. Note that the amplitude of each PPG oscillation is directly related to the blood pressure, which is not constant over time. Therefore, a time-varying amplitude is needed to better capture the PPG behavior. Third, the oscillation is assumed to be sinusoidal in the harmonic model. It is well known that the systolic and diastolic phases of the cardiac activity do not last the same amount of time (Guyton and Hall, 2000). A similar fact holds for the respiratory activity (Guyton and Hall, 2000). Therefore, we need to consider a non-sinusoidal shape to model the oscillations in the PPG. In summary, the harmonic model needs to be substituted by one that is able to capture at least three characteristic aspects of a PPG signal – time-varying frequency, a time-varying amplitude and a non-sinusoidal shape.

A well-studied substitute for the harmonic model that can work well in this context is the adaptive non-harmonic model (ANHM) (Wu, 2013; Lin et al., 2016). A signal satisfies the ANHM, if it is oscillatory with a time-varying frequency, a time-varying amplitude and non-sinusoidal oscillations. Mathematically, it is expressed as

\begin{array}{l} f (t) = \sum_{k = 1}^{K} a_{k} (t) s_{k} (ϕ_{k} (t)), & (1) \end{array}

where a_k(t)s_k(ϕ_k(t)) is the k-th oscillatory component in the recorded signal f(t), called the intrinsic-mode-type function, a_k(t) is the time-varying amplitude, the ϕ_k(t) is a monotonically increasing function whose derivative is the time-varying frequency and s_k is a 1-periodic function that captures the non-sinusoidal shape. We call the derivative of ϕ_k(t) and s_k the fundamental frequency and the wave-shape function of the k-th intrinsic-mode-type function respectively. We represent the single-channel PPG signal by the ANHM with K = 2, where the first intrinsic-mode-type function, a₁(t)s₁(ϕ₁(t)), models the respiratory dynamics and the second intrinsic-mode-type function, a₂(t)s₂(ϕ₂(t)), models the cardiac dynamics. This model is still too general. We need to incorporate in it some physiological knowledge regarding the phenomena we want to describe. The first observation regards the variation of IHR and IRR. Even though they are not constant, they do not change suddenly based on the physiological homeostasis assumption, unless there is any unexpected impact from the outside system. We thus assume that the IHR and IRR vary slowly over time. This implies that both the time-varying frequencies and the time-varying amplitudes vary slowly over time in this model. Under normal physiological circumstances, we further assume that the fundamental frequency of the cardiac dynamics in the ANHM is larger than the one of the respiratory dynamics. Under all these assumptions, the ANHM can be properly analyzed from a mathematical standpoint. We refer interested readers to (Wu, 2013; Lin et al., 2016) for more technical details on this model, like the time-varying wave-shape function, and its mathematical analysis.

2.2. Proposed Algorithm: deppG

Analyzing the ANHM, and hence the PPG, is generally a challenging problem from a signal processing viewpoint. Due to the time-varying nature of the amplitudes and fundamental frequencies, if we try using standard techniques to study PPG signals from a TF point of view, we discover that the spectrum of each component of a PPG signal is broad. The problem is further complicated by the non-sinusoidal nature of the wave-shape function which makes each spectrum become even broader. To complete the picture, we have to consider that having a broad spectrum for each oscillatory component implies they will interact with each other in the frequency domain. All of these facts together explain why analyzing a PPG signal is almost an impossible problem for standard signal processing methods. Figure 4 shows an illustration of this challenge. To be more specific, there are at least two signal processing challenges we need to resolve, if we want to simultaneously extract the IHR and IRR from a single-channel PPG signal. The first challenge is to deal with the broad spectrum determined by the time-varying nature of the IHR and IRR, and hence obtain the instantaneous information. The second challenge is to take care of the interaction between the cardiac and respiratory activities that is caused by non-sinusoidal wave-shape functions.

Below, we detail the proposed deppG algorithm that handles the above mentioned challenges. A summary of the algorithm is shown in Figure 1. For readers with interest in the theoretical analysis and related mathematical theorems, we refer them to (Lin et al., 2016).

2.2.1. Step1: Handle the First Challenge by the Short-Time Fourier Transform

We find the “local spectrum” of the PPG by computing the Fourier transform of a small piece of the PPG cut out by a chosen sliding window function. By concatenating the local spectra together according to time, we obtain the time-varying spectrum that is commonly named the spectrogram. Mathematically, the STFT of $f \in S^{'}$ with a chosen window function $h \in S$ , where $S$ is the Schwartz space and $S^{'}$ is the space of tempered distributions, is defined as

\begin{array}{l} V_{f}^{(h)} (t, ξ) = \int f (τ) h (τ - t) e^{- i 2 π ξ (τ - t)} d τ, & (2) \end{array}

where t ∈ ℝ indicates time and ξ ∈ ℝ indicates frequency. We call $| V_{f}^{(h)} (t, ξ) |^{2}$ the spectrogram of the signal f. The spectrogram thus captures the time-varying fundamental frequencies of each component contained in the PPG, including the IHR and IRR.

See Figure 2 for an example. For a given the PPG signal shown in Figure 2A, we can compute its spectrogram using the STFT. The spectrogram, as explained before, provides for each instant of time the “local spectrum” of the PPG. In particular, if we consider the instant corresponding to 25 s, the corresponding local spectrum is obtained by first multiplying the PPG signal by a Gaussian window centered at the 25-th second, whose outcome is shown in Figure 2B, and then applying the Fourier transform to such a signal. The outcome is the local spectrum corresponding to the 25-th second of the signal, which is shown in Figure 2C.

FIGURE 2

Figure 2. (A) The photoplethysmogram (PPG) signal of the dataset 0031_8 min in the Capnobase database lasting for 50 s. The signal is shifted by 1 to simulate the trend commonly encountered in real data. (B) The “windowed” PPG signal that is generated by multiplying the PPG signal by a Gaussian window centered at the 25-th second. (C) The power spectrum of the windowed PPG signal shown in (B). The fundamental frequency, 1.2Hz, and its multiples, 1.2 × 2, 1.2 × 3, …, etc, are indicated by the blue arrows. Note that the magnitude varies from peak to peak, which depends on the non-sinusoidal oscillation of the PPG signal. Also note that this power spectrum is the spectrogram at the 25-th second. (D) The 0.1 power of the power spectrum shown in (C). It is clear that the magnitudes of all peaks become more uniform after taking the fractional power. We could thus view it as a periodic function with the “period” exactly the same as the fundamental frequency of the PPG signal. (E) The spectrum of the (D) that contains the fundamental period information, which is the inverse of the fundamental frequency, of the PPG signal at the 25-th second. Specifically, that the peaks indicated by the red arrows located at 1/1.2, 2/1.2, …, etc, are associated with the fundamental period and its multiples. (F) The nonlinear mask determined by inverting the quefrency axis by sending a nonzero quefrency q to 1/q. We could clearly see that the peaks indicated by the red arrows in (E) become the peaks indicated by the green arrows (F). By a direct algebraic calculation, the peaks indicated by the green arrows correspond to the fundamental frequency, 1.2 Hz, and its divisions, 1.2/2, 1.2/3, …, etc.

From this local spectrum, we discover that the non-sinusoidal shape function, shown in Figure 2A, has a periodic spectrum. Specifically, in addition to the fundamental frequency (the peak at about 1.2 Hz), in the local spectrum shown in Figure 2C, we have multiples of the fundamental frequency indicated by the blue arrows. Note that the multiples would have different strengths, which depend on the non-sinusoidal shape function. By taking a small fractional power of the local spectrum, the strengths of the multiples become more “uniform,” as plotted in Figure 2D. Therefore, we can focus on the periodic structure.

2.2.2. Step 2: short-Time Cepstrum Transform and the Nonlinear Mask

While the STFT provides a fruitful information about the cardiac and respiratory oscillation, however, it is not an easy task to extract the IHR and IRR information from the spectrogram computed using STFT due to the uncertainty principle of the STFT and the second challenge. The interference of the spectral information from cardiac and respiratory activities due to their non-sinusoidal shape determines the coexistence in the spectrogram of both IHR, IRR, and their multiples. This phenomenon can be clearly seen in the spectrograms shown in Figures 1, 4.

To handle the second challenge, the main trick is to take the Fourier transform of the small fractional power of the local spectrum (shown in Figure 2D). By doing so, we get the information about the oscillatory period and its multiples in what is called the quefrency domain in the literature (Oppenheim and Schafer, 2004). See Figure 2E for an example. Specifically, the peaks indicated by the red arrows represent the oscillatory period and its multiples. The terminologies “quefrency” is invented by interchanging the consonants of the first part of the word “frequency” in order to emphasize that the information is not in the original time domain but is related to the frequency domain (Oppenheim and Schafer, 2004). By definition, the quefrency has the same unit as time; in our analysis example, the unit is second. In the signal processing field, the function defined on the quefrency domain is called the cepstrum (Oppenheim and Schafer, 2004). We then repeat such analysis for each instant of time to produce the local cepstrum at every second. By concatenating the local cepstrum together according to time, we get the short-time cepstrum transform (STCT) of the PPG (Lin et al., 2016). Mathematically, the STCT is evaluated by

\begin{array}{l} C_{f}^{(h, γ)} (t, q) : = \int | V_{f}^{(h)} (t, ξ) |^{γ} e^{- i 2 π q ξ} d ξ, & (3) \end{array}

where γ > 0 is sufficiently small and q ∈ ℝ is the quefrency.

Thanks to the STCT, we are ready to design the newly proposed nonlinear mask. Using the fundamental relationship between frequencies and periods, periods are the inverse of the frequencies, the nonlinear mask is obtained by inverting the quefrency axis of the STCT. Note that by inverting the period and its multiples, this nonlinear mask contains the fundamental frequency and its divisions. See Figure 2F for an illustration. Mathematically, this nonlinear mask for the spectrogram is given by

\begin{array}{l} U_{f}^{(h, γ)} (t, ξ) : = C_{f}^{(h, γ)} (t, I ξ), & (4) \end{array}

where ξ > 0 is in the unit of Hz and $U_{f}^{(h, γ)} (t, \cdot)$ is in general a distribution defined on ℝ × ℝ⁺.

2.2.3. Step 3: Handle the Second Challenge by Applying the Nonlinear Mask

As a result, the common component between the nonlinear mask and the spectrogram is only the fundamental frequency. Therefore, by multiplying the nonlinear mask with the spectrogram, we “filter out” the spectral information associated with the non-sinusoidal shape function, and preserve the fundamental frequency information. It is clear that the nonlinear mask just defined depends mainly on the PPG itself, so it is data adaptive. This adaptivity feature of the nonlinear mask design allows the proposed method to, at the same time, automatically preserve and enhance the instantaneous information contained in the PPG signal regarding the IHR and IRR. Mathematically, this nonlinear mask satisfies

\begin{array}{l} W_{f}^{(h, γ)} (t, ξ) : = V_{f}^{(h)} (t, ξ) U_{f}^{(h, γ)} (t, ξ), & (5) \end{array}

where ξ > 0 is interpreted as frequency and $W_{f}^{(h, γ)}$ is defined on ℝ × ℝ⁺.

2.2.4. Step 4: Refine the First Challenge by Sharpening the Time-Frequency Representation

It has been shown in (Lin et al., 2016, Theorem 3.6) that if we model the PPG signal by the ANHM, the IHR and IRR information shows up as two curves in the nonlinearly masked spectrogram. To enhance the sharpness of these two curves, we apply another nonlinear transform technique called the synchrosqueezing transform (Daubechies et al., 2011; Chen et al., 2014). The synchrosqueezing transform sharpens the curves by taking the phase information of the STFT into account. Specifically, each spectrogram coefficient is nonlinearly reallocated to a new location that is determined by the phase information of the STFT. It has been shown in (Daubechies et al., 2011; Chen et al., 2014) that when a signal, like the PPG, is modeled by the ANHM, the spectrogram coefficients will be reallocated and the IHR and IRR curves will be enhanced. The robustness of the synchrosqueezing transform to different kinds of noise has been studied in (Chen et al., 2014), and it has been applied to the non-contact PPG signal analysis to obtain the IHR of the state-of-art quality (Wu et al., 2016). Mathematically, the synchrosqueezing transform (Daubechies et al., 2011; Chen et al., 2014) is defined as as

\begin{array}{l} S W_{f}^{(h, γ, υ)} (t, ξ) = \int_{𝔑_{υ} (t)} | W_{f}^{(h, γ)} (t, η) | \frac{1}{α} g (\frac{| ξ - Ω_{f}^{(h, υ)} (t, η) |}{α}) d η, & (6) \end{array}

where ξ ≥ 0, $g \in S$ so that g(·/α)/α converges weakly to the Dirac measure supported at 0 as α → 0, and $Ω_{f}^{(h, υ)}$ is the reassignment rule determined by taking the phase information in the STFT into account via

\begin{array}{l} Ω_{f}^{(h, υ)} (t, ξ) : = {\begin{array}{l} - ℑ \frac{V_{f}^{(D h)} (t, ξ)}{2 π V_{f}^{(h)} (t, ξ)} & when | V_{f}^{(h)} (t, ξ) | > υ \\ - \infty & when | V_{f}^{(h)} (t, ξ) | \leq υ \end{array}, & (7) \end{array}

$D h (t)$ is the derivative of the chosen window function $h \in S$ , ℑ means the imaginary part, and υ > 0. The threshold υ helps avoid instability in the computations when $| V_{f}^{(h)} (t, ξ) |$ is small. We call the outcome of the nonlinearly masked and sharpened spectrogram the de-shaped spectrogram. The readers can see examples of such de-shaped spectrograms in Figure 1, 4.

2.2.5. Step 5: Curve Extraction to Estimate the IHR and IRR

Finally, since in the PPG de-shaped spectrogram the IHR and IRR show up as sharp curves, we could apply any curve extraction technique to estimate the IHR and IRR. For simplicity, in this work, we use an intuitive optimization-based curve extraction algorithm to extract such curves (Chen et al., 2014). This specific algorithm searches for a curve in the de-shaped spectrogram, whose intensity is maximal in the de-shaped spectrogram, and such that the identified curve is smooth. The smoothness of the curve is controlled by a penalty parameter λ > 0. The overall performance is relatively stable over a wide range of λ, but still there is the need to tune the value of λ. Furthermore, this approach is not optimal from a computational viewpoint. In this work, we accept that we do not have an extremely well-performing curve extraction method and have to tune the λ parameter. Designing a better curve extraction algorithm is by itself an open research problem, and we leave it as a future work.

The whole procedure, including finding the de-shaped spectrogram and extracting curves from the de-shaped spectrogram, form the proposed deppG algorithm.

2.3. Numerical details of the deppG algorithm

We provide numerical details of the deppG algorithm with all parameters for the reproducibility purpose.

Suppose the PPG signal, denoted as f in the continuous setup, is sampled at a frequency f_s Hz and N points are sampled; that is, we collect the PPG signal for N/f_s seconds. We denote the discretized PPG signal as f ∈ ℝ^N. To numerically evaluate the STFT defined in (2), we fix the frequency resolution by $\frac{f_{s}}{2 M}$ , where M ∈ ℕ, so that the number of discretization points in the frequency axis is M and we take a discretized window function h ∈ ℝ^2k+1, k ∈ ℕ. The discretization of the STFT of f, denoted as $V_{f} \in ℂ^{N \times 2 M}$ , is thus numerically evaluated by

\begin{array}{l} V_{f} (n, m) = \frac{1}{f_{s}} \sum_{l = - k}^{k} f (n + l) h (l + k + 1) e^{- i π n (m - M) / M}, & (8) \end{array}

where m = 1, …, 2M and we complete f with 0's so that f(l) = 0 when l < 1 and l > N. Note that the first M frequency bins correspond to the negative frequency axis while the M+1, …, 2M-th frequency bins correspond to the non-negative frequency axis. We fix the quefrency resolution to $\frac{M}{f_{s} M^{'}}$ , where M′ ∈ ℕ, so that the number of discretization points in the quefrency axis is M′. To numerically implement STCT defined in (3), we fix a sufficiently small γ > 0. The discretized STCT, denoted as $C_{f} \in ℂ^{N \times M^{'}}$ , is thus obtained as:

\begin{array}{l} C_{f} (n, m') : = \frac{f_{s}}{2 M} \sum_{m = 1}^{2 M} | V_{f} (n, m) |^{γ} e^{- i π m^{'} (m - M) / M}, \end{array}

where m′ = 1, …, M′.

The numerical implementation of the nonlinear mask defined in (4), denoted as $U_{f} \in ℂ^{N \times M}$ , is given by:

\begin{array}{l} U_{f} (n, m) : = {\begin{array}{l} \sum_{1 / m^{″} = ⌈ m - 1 / 2 ⌉}^{⌈ m + 1 / 2 ⌉} {\tilde{C}}_{f} (n, \frac{1}{m^{″}}) & when m = 1, \dots, ⌈ \frac{f_{s}}{2 M θ} ⌉ \\ 0 & otherwise \end{array} & (9) \end{array}

for all n = 1, …, N, where 0 < θ < 1 is the threshold chosen by the user for the sake of stabilizing the influence of the low quefrency component in the nonlinear mask design. ${\tilde{C}}_{f} \in ℂ^{N \times α M}$ is the upsampled version of C_f in the quefreqncy axis by α ∈ ℕ times that is defined by

\begin{array}{l} {\tilde{C}}_{f} (n, m^{″}) : = C_{f} (n, M + \frac{m^{″}}{α}), & (10) \end{array}

for all n = 1, …, N and m″ = 1, …, 2αM.

In the nonlinear mask, we only focus on the positive frequency axis, so only M bins corresponding to the non-negative frequency are evaluated.

The de-shape spectrogram of f, denoted as $W_{f} \in ℂ^{N \times M}$ , is thus evaluated by discretizing (6) as

\begin{array}{l} W_{f} (n, m) : = \sum_{k : {\begin{array}{l} | ℑ Ω_{f} (n, k) - m Δ_{ω} | \leq Δ_{ω} / 2 \\ | V_{f} (n, k) | \geq υ \end{array}} | V_{f} (n, M + k) U_{f} (n, k) |, \end{array}

where $Ω_{f} \in ℂ^{N \times M}$ is the following direct discretization of the reassignment rule (7)

\begin{array}{l} Ω_{f} (n, m) : = {\begin{array}{l} \frac{- i V_{f} (n + 1, M + m) - V_{f} (n, M + m)}{2 π f_{s} V_{f} (n, M + m)} & when n = 1, \dots, N - 1 \\ \frac{- i V_{f} (1, M + m) - V_{f} (N, M + m)}{2 π f_{s} V_{f} (N, M + m)} & when n = N \end{array} & (11) \end{array}

for m = 1, …, M.

The curve extraction algorithm is implemented by fitting a discretized curve $c^{*} \in Z_{M}^{N}$ , where Z_M: = {1, 2, …, M}, to the dominant curve in W_f, which is obtained by maximizing the following functional

\begin{array}{l} I (c) : = \sum_{n = 1}^{N} log (\frac{| W_{f} (n, c (n)) |}{\sum_{n = 1}^{N} \sum_{m = 1}^{M} | W_{f} (n, m) |}) - λ \sum_{n = 2}^{N} | c_{n} - c_{n - 1} |^{2}, & (12) \end{array}

where $c \in Z_{M}^{N}$ and λ > 0 is the penalty term controlling the regularity of the extracted curve. Clearly, the first term of the functional $I$ captures the dominant curve in the de-shaped spectrogram. The final IHR or IRR is obtained by $\frac{f_{s}}{2 M} c^{*}$ .

In order to have a good accuracy in the extraction of IHR and IRR curves from the PPG signal, we apply a “divide et impera” approach. We first extract the curve corresponding to the IHR. Then we use it to filter out the cardiac component information from the de-shape spectrogram. We do so by splitting the de-shaped spectrogram of the PPG signal into two parts. The de-shaped spectrogram of the respiratory part of the signal is then analyzed to extract the curve corresponding to the IRR. Regarding the extraction of curves corresponding to the IHR and IRR from the de-shaped spectrogram, we point out that while there exist several good algorithms, how to design an accurate and fully automatic curve extraction algorithm is in general an open problem. In the present work we do not pursue this research problem, but simply apply existing curve extraction algorithms to simplify the discussion. More can be done in this direction in order to improve the performance of the proposed algorithm.

For a fair comparison and for reproducibility purposes, the parameters for computing the de-shaped spectrogram are fixed below. For the ICASSP 2015 signal processing cup database: f_s = 125 and M = 6, 250, the window size is chosen to be k = 750 in the STFT, γ = 0.3 in the STCT, α = 5 and θ = 0 in the nonlinear mask design, υ = 10⁻⁹% of the root mean square energy of the signal under analysis, and g is a direct discretization of the Dirac measure in the de-shaped spectrogram. For the Capnobase benchmark database: f_s = 300, M = 30, 000, k = 600, γ = 0.3, α = 5, θ = 0, υ = 10⁻⁹% of the root mean square energy of the signal under analysis, and g is a direct discretization of the Dirac measure in the de-shaped spectrogram. We mention that the result is not sensitive to the chosen parameters, and no global parameter optimization is carried out in this study, with the only exception for the parameter λ used in the curve extraction algorithm, and all parameters are chosen in an ad-hoc matter. The window length is chosen under the empirical rule of thumb that the window should cover few cycles to accumulate enough information, and it should not be too long otherwise the instantaneous information is lost. In practice, choosing a window length from 4 to 10 cycles gives good results.

3. Results

In this section, we report the analysis results of the deppG algorithm when applied to the Capnobase benchmark database and the ICASSP 2015 signal processing cup database. The Capnobase benchmark database (http://www.capnobase.org) consists of recordings of spontaneous or controlled breathing in static patients. The ICASSP 2015 signal processing cup (http://www.zhilinzhang.com/spcup2015/), by contrast, contains recordings from pulse oximeter positioned on the wrists of running subjects.

3.1. Quantities for evaluation

3.1.1. Instantaneous vs. Averaged

It is necessary to clarify some commonly abused terminologies before reporting the results. It is well known that “how fast the heart beats” and “how fast one breathes” provide important physiological information. We commonly use the terminologies, heart rate (HR) and respiratory rate (RR), to refer to them. However, in general, the HR and RR are not scientifically well-defined if the measurement scale is not specified. When the measurement scale is “infinitesimal”, the HR and RR become IHR and IRR. These instantaneous quantities are what we look for for the physiological variability analysis. When the measurement scale is over seconds or minutes, the HR and RR are derived by counting how many beats or breaths take place over a provided window of length on the scale of seconds or even minutes, of by finding the peak of the power spectrum of the signal over that window. We call the resulting quantities the averaged HR (AHR) and averaged RR (ARR). Clearly, when we mention AHR and ARR, the measurement scale is needed. The commonly encountered quantities regarding “how frequently the heart beats and how frequently one breathes” in the PPG literature are AHR and ARR. Note that the IHR and IRR are continuous functions, and the AHR and ARR could be viewed as the low-pass filtered IHR and IRR that comes from a window smoothing technique.

In practice, however, due to the sampling issue, it is in general not possible to measure any physiological system on the infinitesimal scale. The situation is worse when we deal with the HR, for example. Specifically, the IHR is a quantity inaccessible to researchers, and we could only get its surrogate information from the ECG landmarks, like the R peaks. As is recommended by the Task Force (Task Force, 1996), we could “recover” the underling IHR by the interpolation, if the ECG sampling rate is high enough, and the R peak locations are determined accurately. Based on this recommendation, a practical expectation is having an HR estimate on the scale as small as possible, like 10 msec or 1 msec. We will take the IHR obtained from the R-peak to R peak interval (RRI) time series¹ as the gold standard.

As we will show below, the proposed deppG algorithm allows us to directly compute the IHR and the IRR on the scale of 10 msec, and in the sequel, when we mention “instantaneous”, it is this scale that we refer to. With the IHR and IRR, we can easily derive the averaged quantities AHR and ARR by taking the averaging process.

3.1.2. Continuous v.s. Discrete Quantities

While the variations of IHR and IRR are commonly applied to quantify the physiological variability, the other signal that is also commonly analyzed to evaluate the physiological variability is the “beat-to-beat interval” time series, for example, the R-peak to R peak interval (RRI) time series. Take the IHR and RRI as an example. Note that the IHR is a continuous signal, while the RRI is a time series. The IHR and RRI are intimately related and could be converted from one to the other. Precisely, based on the suggestion from Task Force (Task Force, 1996), we could derive the IHR from the RRI time series via interpolation. On the other hand, we could recover the RRI time series from the IHR via searching the pre-assigned landmark of the ECG signal. In general, we obtain one quantity, and recover the other one via the above-mentioned conversion.

For the ECG signal, the conversion between the IHR and the RRI time series seems transparent, and it is easier to find the RRI time series first by the standard R peak detection algorithm. We might extrapolate this fact and expect that finding the peaks is easier than finding the instantaneous frequency in other oscillatory signals. However, we claim that it is not the case. Instead, for most oscillatory signals, finding the instantaneous frequency could be easier and more stable than finding the peaks.

Notice that the main mechanism of the R peak detection algorithm is based on the easy-to-define and easy-to-identify landmark in the ECG signal, like the R peak. However, for most oscillatory signals, such landmark is not easy to find, even to define. Take the respiratory oscillation in the PPG signal into account. It is clear that while there is an oscillation, it is not easy to define a reliable landmark in the PPG signal for the respiratory component so that we could determine the “beat-to-beat interval” time series for the respiration. Another scenario in which the peak detection is not favorable is when noise exists and the landmark is not “stable.” The R peak could be view as a stable landmark in the sense that it is robust to noise. Often, the “peak” of most oscillatory signals, even if it is possible to provide a universally accepted definition for “peak,” is not stable when noise exists. For example, the respiratory signal recorded from the capnogram.

In summary, depending on the type of oscillatory signal, we might favor to directly estimate the continuous quantity. In this paper, we focus on the continuous quantity and report the comparison.

3.2. Simulated Database

To validate the proposed deppG algorithm in extracting the instantaneous physiological dynamics from the PPG signal, we consider the following semi-real simulated PPG database. First, we take a 5-min PPG signal sampled at 300 Hz, recorded from Philip patient monitor MP60, from 5 normal subjects under general anesthesia. The 5-min period is chosen to contain the endotracheal intubation and other manipulation so that the PPG signal we will simulate contains enough dynamical changes. Truncate the signal into pieces according to the oscillations, where the oscillations are visually confirmed. Then, align all cycles so that their peaks are aligned, and trim each cycles so that they have the same length. Denote the trimmed cycles $P_{i}^{(0)} \in ℝ^{p}$ , where i = 1, …, N₀ and p,N₀ ∈ ℕ. We remove the outliers in the following way. Evaluate a median cycle out of all cycles P^(m) ∈ ℝ^p so that $P^{(m)} (j) = median {P_{i}^{(0)} (j)}_{i = 1}^{N_{0}}$ , where j = 1, …, p. Then collect all cycles that have the L² distances from P^(m) within the 90% quantile of ${| | P^{(m)} - P_{i}^{(0)} | |}_{i = 1}^{N_{0}}$ . Suppose there are K < N₀ cycles left. To avoid the boundary effect, we subtract each cycle by a linear function so that the resulting cycle starts and ends at 0. Finally, collect those post-processed cycles as $C : = {P_{i}}_{i = 1}^{K}$ , and view each P_i as a periodic function. We call $C$ a standardized PPG cycle set. Second, for each subject in the Capnobase benchmark database, evaluate the IHR derived from the RRI of the ECG signal by the cubic spline interpolation according to the Task Force guidance (Task Force, 1996). Since the purpose of this simulated database is to validate the proposed algorithm in getting the instantaneous physiological information, to avoid the influence coming from the bad ECG signal quality, we focus on the IHR signal over a continuous 3 minutes period where the ECG signal is not contaminated by big noise. Note that the possibly existing arrhythmia is not taken into account when evaluating the signal quality. Denote the selected 3-min IHR as IHR_ECG(t), and the phase of the IHR as ϕ_ECG(t); that is, $\frac{d}{d t} ϕ_{ECG} (t) = IH R_{ECG} (t)$ . Then, sample ϕ_ECG at 300 Hz and denote the discretized phase function as ϕ ∈ ℝⁿ. Third, simulate a semi-real PPG signal from a standardized PPG cycle set $C$ and a phase function ϕ by the following stitching step. Denote Q to be the modulus of ϕ after being divided by 1, which is clearly a sawtooth function. Denote ${ι_{j}}_{i = 1}^{I} \subset ℕ$ so that ι_j is the j-th jump of Q from 1 to 0. The hemodynamic component of the simulated semi-real PPG signal, denoted as h ∈ ℝⁿ, is constructed by

\begin{array}{l} h (i) = {\begin{array}{l} P_{\tilde{k}} (round (ϕ (i))) & when i = ι_{k} + 1 \dots, ι_{k + 1} \\ 0 & otherwise, \end{array} & (13) \end{array}

where round(x) rounds the real number x to the nearest integer, i = 1, …, n, and $\tilde{k}$ is the modulus of k after being divided by K. The respiratory component of the simulated semi-real PPG signal, denoted by r ∈ ℝⁿ, is simply taken from the capnometry signal after scaling it so that its mean is a value randomly and uniformly chosen from the interval between one quarter and half of the mean of h. The clean semi-real PPG signal is constructed by f₀: = h+r. To model the inevitable noise, we consider an autoregressive and moving average (1, 1) time series ξ ∈ ℝⁿ, which is determined by the autoregression polynomial a(z) = 0.95z−1 and the moving averaging polynomial b(z) = 0.5z+1, with the innovation process taken as independently and identically distributed Student t₄ random variables. We model the noise in order to capture the possibly time-dependent structure in the noise and the spike-like artifacts commonly seen in the real data. The final simulated semi-real PPG signal is $f = f_{0} + σ ξ \in ℝ^{n}$ , where σ > 0 is the strength of the noise, which is chosen to satisfy the desired signal-to-noise ratio (SNR), where the SNR is defined as $20 {log}_{10} \frac{std (f_{0})}{std (σ ξ)}$ , and the unit is dB.

For each simulated PPG signal, we denote the estimated IHR from deppG as IHR_deppG(t), and evaluate the RMS of estimating IHR_ECG(t) by IHR_deppG(t) to evaluate the algorithm's performance.

Among those 42 subjects in the Capnobase, the signal qualities of 0030_8min, 0115_8min, 0147_8min, 0309_8min and 0329_8min are not ideal, so the IHR's are selected from a 3-min interval with a reasonable ECG quality. For the other subjects, we take the first 3 mins IHR for the simulation. With 4 subjects under general anesthesia, we obtain 4 standardized PPG cycle sets. As a result, we could realize 168 semi-real simulated PPG signals for each SNR. See Figure 3 for an illustration of one typical realization of the simulation with the 20dB SNR and the analysis result. It is clear that the spectrogram is broadened, which is caused by the uncertainty principal shared by all linear-type time-frequency analysis tools. It is not easy to capture the “instantaneous” information about the hemodynamics from this broadened spectrum (the red arrow). Furthermore, the multiples of the fundamental frequency of the hemodynamic component are clearly seem in the spectrogram, which makes it difficult to interpret (the green arrows). Even worse, the respiratory component in the PPG signal contributes to the low frequency region (the blue arrow), while its multiples are relatively weak and cannot be clearly seen in the figure. Such limitations are alleviated directly after applying this de-shape nonlinear masking technique. In the de-shape spectrum, only a clear, dominant, and concentrated line exists, which represents the instantaneous hemodynamic information. In this case, the RMS is 13.51 ms.

FIGURE 3

Figure 3. Top panel: a subinterval of the simulated photoplethysmogram (PPG) signal with the instantaneous heart rate (IHR) determined from 0009_8min in the Capnobase database and 20dB signal-to-noise ratio. a.u. means “arbitrary unit.” Middle panel: the spectrogram determined by the short time Fourier transform (STFT) is shown on the left hand side, the de-shaped spectrogram is shown in the middle, and the zoomed in de-shaped spectrogram is shown on the right hand side. The red arrow indicates the band associated with the fundamental frequency of the hemodynamic component, the green arrows indicate those bands associated with the multiples of the hemodynamic component, and the blue arrow indicates the band associated with the fundamental frequency of the respiratory component. Bottom panel: a further zoomed in de-shape spectrogram is shown on the left hand side and the middle, with the ground truth IHR superimposed in the middle, and the error between the estimated IHR and the ground truth IHR is shown on the right hand side. The orange arrows indicate the possible oversmoothing effect of the deppG algorithm. See the main context for the interpretation of this figure.

When the SNR is 20 dB, the RMS is 16.02 ± 7.79 msec, where the results are represented as “mean ± standard deviation” over 168 realizations. Overall, we could obtain the heart rate information in the instantaneous sense on the scale of around 20 msec. In practice, the noise sometimes could be large. Consider the case when the SNR is 10 dB. In this case, the RMS is 20.94 ± 8.07 msec. It is clear that when the noise is larger, the algorithm's performance is worse, but the performance degradation is stable to the noise.

3.3. Capnobase Benchmark Database – PPG Signal with Respiration

The Capnobase benchmark database includes 42-8 min segments from 29 pediatric and 13 adults cases containing reliable recordings of spontaneous or controlled breathing. For each subject, the ECG, capnometry, and PPG signals were recorded at 300, 25, and 100 Hz respectively. All signals were recorded with S/5 Collect software (Datex–Ohmeda, Finland). The PPG and capnometry were automatically up-sampled to have a 300 Hz sampling rate. The database also contains a reference ARR, as well as information regarding the beginning of each expiration, both derived from the capnogram waveform and identified by experts. The reference AHR as well as R-peak locations derived by experts from the ECG waveform are also provided.

The deppG method provides estimates for the IHR and IRR, which are instantaneous in nature. However, it is important to remember that, to the best of our knowledge, the methods proposed so far in the literature for the Capnobase benchmark datasets provide AHR and ARR information. They do not focus on computing instantaneous rates, but average rates over a time window. In most cases, the window is set to be around 60 s; that is, they report the HR on the scale of 60 s. Thus, for a fair comparison, we also provide AHR and ARR curves obtained from the estimated IRR and IHR by smoothing over a 60 s window shifted of 30 s each time. We refer to this variation of the method as deppG-60s.

We denote $x_{1}^{ref}, \dots, x_{n}^{ref}$ , where n is the number of observations, to be the reference information, i.e., IHR or IRR (respectively AHR or ARR) which are either provided directly by the experts and included in the database, or derived from the R peaks and the beginning timestamps of expiration respectively. We denote $x_{1}^{alg}, \dots, x_{n}^{alg}$ to be the estimated IHR or IRR determined by deppG (respectively the estimated AHR or ARR determined by deppG-60 s). Following what has been done in the literature, we assess the performance of the proposed algorithm using the RMS and the mean absolute error (MAE), which are defined as

\begin{array}{l} MAE = \frac{1}{n} \sum_{i = 1}^{n} | x_{i}^{ref} - x_{i}^{alg} | & (14) \end{array}

The challenge of simultaneously estimating IHR and IRR from a single PPG signal is made more difficult by the possible presence of artifacts. The Capnobase benchmark database include in each dataset information regarding potential intervals in the PPG, ECG and Capnometry waveforms that contain artifacts. The methods reported in the literature use this information to skip the measurement errors over windows, if not even entire datasets, that are considered unreliable; see, for example, (Dehkordi et al., 2015). In our tests, we evaluate the performance of the proposed deppG method using all of the 42 datasets, without excluding any of their intervals, even those that are supposed to contain artifacts. We do so to test the performance of the deppG algorithm in real scenarios. It is clear that for the intervals labeled as containing artifacts, the ground truth provided in the database is simply given as an interpolation of the nearby reliable values; see the right panel in Figure SI.2. This introduces a bias in the error values we compute. In particular, the performance of the proposed method can be consistently improved, if we remove intervals containing artifacts. However, in a real life application, such information is clearly not available in general. We discuss this in more detail in Section C in the online SI.

The RMS and MAE of the deppG and deppG-60s are then evaluated and reported in Table 1. These tables provide the summary statistics of the RMS and MAE for the respiratory and heart rate obtained from the proposed algorithm. The performance of other methods proposed so far in the literature, and their chosen windows for averaging, are also included for a comparison. From these tables, it is clear that the deppG provides a satisfactory IHR and IRR estimation, and the AHR and ARR provided by deppG-60s perform better than the other methods proposed in the literature. Specifically, the RMS of the IHR by deppG is 0.93 beats per minute, which is equivalent to 15.5 ms. This result could be interpreted as obtaining the HR in the instantaneous sense on the scale of around 15 msec. A similar interpretation holds for the IRR result.

TABLE 1

Table 1. Summary of root mean square error (RMS) and mean absolute error (MAE) of the respiratory rate (RR) and heart rate (HR) estimation for the Capnobase benchmark database.

An example illustrating how the deppG analyzes the PPG signal of the dataset 0009_8 min in the Capnobase benchmark database, is shown in the algorithm flowchart in Figure 1. By a visual inspection, it is clear that there are two oscillations inside the PPG signal – the faster (respectively slower) oscillations are associated with the heartbeat (respectively respiration). It is clear that the spectrogram of the PPG signal is complicated by the interfering multiples of these two oscillations, which are due to their non-sinusoidal shape as explained in the previous section. However, this interference is eliminated in the de-shaped spectrogram.

3.4. ICASSP 2015 Signal Processing Cup – PPG Signal with Motion

The second database we consider is the training database of the ICASSP 2015 signal processing cup, which contains recordings of 12 male subjects with yellow skin and ages ranging from 18 to 35. Two-channel PPG signals, three-axis acceleration signals and a one-channel ECG signal were simultaneously recorded from the subjects. For each subject, the PPG signals were recorded from the dorsal of the wrist by two pulse oximeters with green LEDs (wavelength: 515nm), and the distance between two PPG sensors (from center to center) was 1.5 cm (Zhang, 2015, and private communication).

Each dataset is a 5-min PPG signal sampled at 125Hz and recorded when a subject runs on a treadmill with changing speeds, scheduled as follows: for subject one, the schedule is rest (30 s) → 8 km/h (1 min) → 15 km/h (1 min) → 8 km/h (1 min) → 15 km/h (1 min) → rest (30 s); for subject 2 to subject 12, the schedule is rest (30 s) → 6 km/h (1 min) → 12 km/h (1 min) → 6 km/h (1 min) → 12 km/h (1 min) → rest (30 s). The subjects were asked to purposely use the hand with the wristband to pull clothes, wipe sweat on the forehead, and push buttons while they run on the treadmill, in addition to freely swing. The acceleration signal was also recorded from the wrist using a three-axis accelerometer. Both the pulse oximeter and the accelerometer were embedded in a wristband, which was comfortably worn. The ECG signal was recorded from the chest using wet ECG sensors. Based on the ECGs, the reference AHR over an 8-second time window evaluated by experts is provided in the database.

To evaluate the performance of the proposed deppG algorithm, and to have a fair comparison with other reported results, for this database we follow the available literature and consider the average absolute error (AAE), which is the same as MAE defined in (14), and the average absolute error percentage (AAEP):

\begin{array}{l} AAEP = \frac{1}{n} \sum_{i = 1}^{n} \frac{| x_{i}^{ref} - x_{i}^{alg} |}{x_{i}^{ref}} . & (15) \end{array}

The summary of the HR estimation by different methods proposed in the literature and our method is shown in Table 2. Note that the result by deppG is consistently better than the existing reported results. Since we do not have the ground truth information about the RR, we do not evaluate it for this database.

TABLE 2

Table 2. The average absolute error (AAE, the same as the mean absolute error) and average absolute error percentage (AAEP) of the heart rate estimation for the ICASSP 2015 signal processing cup database.

An important factor to be taken into account is that PPG signals are prone to motion artifacts, even in presence of small motion (Zhang et al., 2014). The PPG signals under analysis have been sampled during intensive physical exercise. Therefore, we expect to have strong motion artifacts. To see the contribution of the motion artifacts that do not exist in the Capnobase benchmark database, we show in Figure 4 the PPG signal of subject 9 of the training dataset. In the same figure, we show the spectrogram of that PPG signal as well as the de-shaped spectrogram. From the recorded signal, which is shown in the top row of Figure 4, it is not easy to see how the motion and heartbeat vary. The IHR displayed in the de-shaped spectrogram in Figure 4 clearly follow the running pattern. It starts around 90 beats per minute and it goes up to roughly 150 beats per minute after 2.5 min, then it goes down for a minute and then again up for another one.

FIGURE 4

Figure 4. Top row: the photoplethysmography signal of subject 9 in the training dataset of ICASSP 2015 signal processing cup. Second row: the spectrogram is shown on the left panel, the de-shaped spectrogram is shown in the middle panel, and the de-shaped spectrogram with the acceleration offset is shown on the right panel. The dominant curve indicated by the blue arrow in the de-shaped spectrogram with the acceleration offset is the instantaneous heart rate of the subject. On the other hand, the lighter curve indicated by the red arrow in the de-shaped spectrogram is directly related to the body swaying pattern. The heartbeat component displayed in the de-shaped spectrogram follow clearly the running pattern.

Note that the IHR lies between two other components supposedly affected by motion artifacts, which can be observed in the left and central panels in the bottom row of Figure 4. The higher frequency component associated with motion has instantaneous frequency about twice the instantaneous frequency of the lower one. We conjecture that the higher one is contributed by the movement of body while the lower is contributed by the movement of arms and legs. The body finishes a period by just one step, while the leg finishes a period by two steps (one leg needs to finish a forward and backward movement). It is quite natural to catch two components here as they are indeed (at least) two different oscillatory signals, where one has an instantaneous frequency almost twice that of the other one.

In order to remove the contribution of the motion artifacts from a PPG signal, we use the acceleration signal that is recorded using a three-axis accelerometer embedded in a wristband. The spectrogram of the acceleration signal is evaluated and used to mask out the motion artifacts in the de-shaped spectrogram. An example of the final de-shaped spectrogram is shown in Figure 4.

Another observation regards how the reference AHR is provided in the database. Such a reference curve is computed by counting the number of heart beats over the time window 8 s in length, thus we could view the reference AHR as the IHR curve smoothed over a window 8 s in length. However, since the deppG algorithm calculates the IHR directly, there are inevitable “high frequency components” in the extracted curves compared with the reference AHR. Thus, in order to have a fair comparison, we filter out the high frequency components contained in such curves.

This can be done in many different ways. We use a modern signal filtering algorithm called the Iterative Filtering that allows us to automatically detect and remove high frequency components from a non-stationary and non-linear signal (Cicone et al., 2016a,b; Cicone and Zhou, 2017). We call this variation the deppG-IF algorithm. More details about the analysis results are available in Section C.2 in the SI.

In order to measure performance of the proposed deppG method using instantaneous values, we compute a new reference IHR from the R-peaks of the ECG signals provided in the database. In particular, for every instant of time in the middle of two consecutive R-peaks in the ECG signal, we compute the reciprocal of the time period between these two R-peaks. Based on these values, and following the standard protocol (Task Force, 1996), we derive the reference IHR curve as a cubic spline interpolation. We compare these curves with the ones produced by the deppG algorithm. The performance are shown in Table 2. Note that the AAE (or RMS) of the IHR by deppG is 2.97 beats per minute, which is equivalent to 49.5 msec, which means that we obtained the IHR on the scale of around 50 msec. While at the first glance this result is worse than that of the Capnobase, note that the subjects in this database are exercising, so the larger error is expected.

In the online SI, we also report in Table SI.3 the performance obtained using data downsampled to 25 Hz from the original 125 Hz, as well as truncated as proposed in (Zhang, 2015).

3.5. Computational Complexity

The computational complexity of the proposed algorithm is linearly related to the length of the PPG signal. For the Capnobase benchmark database, using MATLAB R2011a installed on a 64-bit Windows 7 Professional computer equipped with a core i3-3227U CPU, 1.9 GHZ, and 8 GB RAM, the average computational time for the whole algorithm to output the IHR and IRR is around 13 s which includes all steps in the algorithm. The only method in the literature for which we know the computational time is (Zhu T. et al., 2015) where the BCLA method is proposed. In that work, the authors state that “the average time for fusing 900 estimates from six algorithms using BCLA was about 0.64 s.” At first glance, our algorithm is slow compared with the BCLA algorithm. However, note that the method in (Zhu T. et al., 2015) computes only respiratory rates. Furthermore, it is not clear if the reported time includes the computation of the 900 estimates which are necessary in the BCLA method. We note also that the signals that are provided at 300 Hz are re-sampled, before applying the BCLA method, at 4 Hz using linear interpolation.

4. Discussion

Given a PPG signal, recorded from a subject performing some activities, even intensive ones, the challenge is to extract at once different physiological dynamics, ideally instantaneous ones. In this paper, we propose a new mathematically rigorous algorithm, named deppG, which is detailed in Section 2, to achieve this goal. This method is able to extract the IHR and IRR simultaneously from a single PPG signal, In order to clarify the meaning of IHR/IRR, we provide a discussion about the notion of “instantaneous frequency” from the measurement scale viewpoint. and it proves to have better performance than any other method previously proposed and to properly extract IHR and IRR curves from a single channel PPG signal, even in the presence of strong motion artifacts.

Compared with the available methods in the literature, the main novelty of the proposed algorithm is threefold. First, it allows us to simultaneously extract the IHR and IRR from the single-lead PPG signal, even when the cardiac and respiratory dynamical oscillations are both time-varying in frequencies and amplitudes, and are non-sinusoidally shaped. The key feature of the deppG algorithm is a nonlinear mask design, which technically converts non-sinusoidal oscillations into sinusoidal ones. This enables us to simultaneously extract IHR and IRR curves. Second, the proposed method is robust to nonstationary noise, and, even more importantly, to motion artifacts. Although not pursued in this work, based on the numerical results, we conjecture that the proposed method is also able to extract moving rhythms like, for example, the gait. Third, the algorithm is local in nature, which allows us to better extract the finer dynamical structure. The non-adaptive truncation-and-stitch policy commonly employed in the field is no longer needed.

There are at least two sources leading to the error in the semi-real PPG simulation database that are worth mentioning. The limitation of de-shape type method plays a fundamental role in the deppG algorithm. Based on the existing theory shown in (Lin et al., 2016), when the variation of the IHR is “dramatic,” the accuracy of the IHR estimation will be downgraded. However, it is well known that the IHR could have sudden changes due to the fractal Brownian motion behavior of the RRI time series. A consequence of this limitation is a possibly oversmoothed estimated IHR, which leads to the first source of the error (the orange arrows in Figure 3). Furthermore, the standardized PPG cycle set is constructed in a way that we could only preserve the continuity property of the PPG signal in the simulated PPG signal, but no guarantee about the smoothness. Particularly, the phase of the simulated PPG signal might not be smooth, which leads to the second source of error. Note that although the second source of error could be improved by designing a better simulation model, we cannot improve the first source of error since it is rooted in the current theoretical limitation. As a result, we only obtain an instantaneous information up to the scale of around 20 msec, which could be roughly related to the ECG signal sampled at 50 Hz. While developing a new theoretical work and HRV analysis are out of the scope of this paper, we leave a further theoretical and algorithmic study to the future.

The accuracy of deppG in the benchmark databases also deserves a discussion. In this study, we take the IHR determined from the RRI as the gold standard, and take the IHR estimated from the PPG by deppG as an estimate. While intuitively the estimate seem reasonable since the oscillations in the PPG signal are caused by the cardiac activities, however, the estimation cannot be that accurate due to the pulse transit time (PTT), which is a fundamental physiological relationship between the ECG and PPG signals. In short, there is a few 100 ms latency between the ventricular response and the PPG pulse observed right after that ventricular response in the ECG waveform. This latency comes from the fact that the pumped blood from the heart takes time to travel through the arteries and capillaries in order to arrive at the PPG measurement location. Even worse, it has been widely known that the PTT depends on the hemodynamics, and it varies from time to time and provides another set of information (Kim et al., 2013). As a result, due to the PTT, the IHR estimated from the PPG might deviate from the gold standard IHR determined from the RRI. In the real scenario, since there is no way to obtain the PTT when we only have a single channel PPG, an error is inevitable. How to obtain a more accurate IHR estimate is an important topic that will be further explored in the future work.

The noise-robustness property of the proposed deppG further broadens its applicability. Recently, due to the advancements of light source generation, more and more effort has been put into the non-contact PPG analysis. As the terminology suggests, unlike the PPG signal, the non-contact PPG is collected from a subject without any direct contact; for example, the PhysioCam (Davila et al., 2016), or other methods that collect the video from a subject (Davila et al., 2016; McDuff et al., 2016). For those techniques, the collected video is post-processed by the researcher to generate the PPG signal. In general, these kinds of signals are noisier, and the information content is not fully understood, except the cardiac dynamics. Clearly, for this kind of signal, the landmark based approach to estimate the “beat-to-beat interval” is even more difficult. The proposed method has the potential to be applied to this kind of signal. Specifically, the synchrosqueezing transform has been applied to study the non-contact PPG signal from the PhysioCam technology (Wu et al., 2016) and a preliminary analysis result by the de-shape synchrosqueezing transform has been shown in (Lin et al., 2016, Section 4.3.3).

The discussion cannot be complete without mentioning the directions of possible further improvement of the proposed algorithm. First of all, while deppG performs well overall, in general the performance of curve extraction step can be further improved. How to design an efficient automatic curve extraction algorithm is an open problem, and several efforts have been invested so far in this direction, for example (Meignen et al., 2012). In this work we adopt a basic method, based on a minimization approach, to get the result. Having a more efficient and automatic algorithm, would allow to obtain even better results.

Second, in the deppG algorithm we sharpen the nonlinearly masked spectrogram by taking into account the phase information computed by the STFT. However, there are several other possibilities to extract more detailed information from the phase function like, for example, the second-order synchrosqueezing transform (Oberlin et al., 2015). While we do not exhaustively explore the possibility in this work, it is promising to combine it into the deppG algorithm.

Third, while the databases we analyze are publicly available and close to several real scenarios, the database size is not large enough from a statistical point of view. Furthermore, not all possible physiological information, physical activity details and environmental conditions are included. A well-designed prospective study with different setups is needed to further evaluate the proposed algorithm.

Fourth, we do not extract all of the available information contained in the PPG signal. In addition to the RIIV, respiratory dynamics are hidden in different forms inside a PPG signal, like the pulse-wave transit time, the PPG pulse width and the respiratory sinus arrhythmia. This means that we could extract several different respiratory signals from a single-lead PPG signal. Furthermore, if we are allowed to also use an ECG channel, then the pulse-wave transit time and the ECG derived respiratory signal could be extracted and used to further improve the results.

Finally, we want to mention that the proposed algorithm has unexplored potential. In this work, we study the PPG signal in order to estimate IHR and IRR curves, so that the deppG algorithm stops once the curves have been extracted from the de-shaped spectrogram. However, for some other applications, we could continue analyzing the PPG signal. It would be possible to perform the wave-shape reconstruction, count the oscillatory components, decompose each oscillatory component, etc. Furthermore, we could even apply the proposed method to other signals. For example, in the fetal ECG signal analysis, decomposing the maternal ECG signal from the fetal ECG signal is a critical step (Su and Wu, 2017).

Author Contributions

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

AC is a Marie Curie fellow of the Istituto Nazionale di Alta Matematica (INdAM) and his research is supported by the “INdAM Fellowships in Mathematics and/or Applications cofunded by Marie Curie Actions,” PCOFUND-GA-2009-245492 INdAM-COFUND Marie Sklodowska Curie Integration Grants. Hau-tieng Wu's research is partially supported by Sloan Research Fellow FR-2015-65363. The authors want to thank Professor Li Su for sharing the initial implementation of the de-shape SST algorithm, and Dr. Yu-Ting Lin for sharing the PPG signal recorded from subjects under general anesthesia for the simulation.

Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fphys.2017.00701/full#supplementary-material

Footnotes

1. ^If the ECG signal was not recorded at 1,000 Hz, resample it to be of 1,000 Hz. This upsample step could give at least millisecond precision in locating the time of the R-wave peak (Laguna and Sörnmo, 2000). Then, apply the standard R-peak detection algorithm to extract the R-peaks, and hence the R-peak to R peak interval (RRI) time series determined by two consecutive heart beats.

References

Aoyagi, T., Kishi, M., Yamaguchi, K., Nakajima, S., Hirai, H., Takase, H., and Kuse, A. (1975). New pulsed-type earpiece oximeter (author's transl) [in japanese]. Kokyu To Junkan, 23, 709–713.

Blanik, N., Hülsbusch, M., Herzog, M., and Blazek, C. R. (2007). Assessment of human hemodynamics under hyper- and microgravity : results of two Aachen University Parabolic flight experiments. Acta Polytechnica, 47, 29–32.

Google Scholar

Cernat, R. A., Ciorecan, S. I., Ungureanu, C., Arends, J., Strungaru, R., and Mihaela Ungureanu, G. (2015). “Recording system and data fusion algorithm for enhancing the estimation of the respiratory rate from photoplethysmogram,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (Milan), 5977–5980. doi: 10.1109/EMBC.2015.7319753