PPGSynth: An Innovative Toolbox for Synthesizing Regular and Irregular Photoplethysmography Waveforms

Photoplethysmography (PPG) is increasingly used in digital health, exceptionally in smartwatches. The PPG signal contains valuable information about heart activity, and there is lots of research interest in its means and analysis for cardiovascular diseases. Unfortunately, to our knowledge, there is no arrhythmic PPG dataset publicly available—this paper attempt to provide a toolbox that can generate synthesized arrhythmic PPG signals. The model of a single PPG pulse in this toolbox utilizes two combined Gaussian functions. This toolbox supports synthesizing PPG waveform with regular heartbeats and three irregular heartbeats: compensation, interpolation, and reset. The user can generate a large amount of PPG data with a certain irregularity, with different sampling frequency, time length, and a range of noise types (Gaussian noise and multi-frequency noise) can be added to the synthesized PPG which can all be modified from the interface, and different types of arrhythmic PPGs (as calculated by the model) generated. The generation for large PPG datasets that simulate PPG collected from real humans could be used for testing the robustness of developed algorithms that are targeting arrhythmic PPG signals. Our PPG synthesis tool is publicly available.


INTRODUCTION
Photoplethysmogram (PPG) signal contains rich information about the cardiovascular system (1). In the past decade, some studies have used PPG to calculate heart rate, oxygen saturation, blood pressure, cardiac output, cardiac index, peripheral vascular resistance, and other indicators of cardiovascular function, and many algorithms were developed to calculate these indices (2).
Four PPG databases, at time of writing, are publicly available: Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) (3), the University of Queensland Vital Signs Dataset (4), Vortal Dataset (5), and PPG-BP (6). The sampling frequency and time length of PPG signals are different in different databases; however, most algorithms designed for these databases are signal-independent. Additionally, it is still a challenge to evaluate the performance of these algorithms under different PPG types and different signal-to-noise ratios (SNR). PPGSynth is developed to generate PPG signals across a wider range of sampling frequencies and time lengths. Three types of irregular PPG signals also can be generated by the PPGSynth tool. It can also conveniently manage parameters and graphical output through a graphical user interface (GUI). This toolbox does not require highly experienced users, but it is recommended that you have basic knowledge of PPG signal and cardiac irregularities.

HEARTBEAT CLASSIFICATION
The amplitude, duration, and waveform shape of PPG pulses tend to vary between persons, and they even differ from moment to moment in the same person. Premature heartbeats are typical irregular PPG beats. There are two different types of premature heartbeats, premature atrial contractions and premature ventricular contractions. This study only focuses on irregular PPG signals that have premature atrial contractions. Premature atrial contraction changes the waveform of PPG for two consecutive beats. In this study, these two beats are defined as the premature group, and the first beat of the premature group and the second beat of the premature group are defined as the first beat and second beat, respectively. The beats without the influence of premature contractions are defined as reference beats. The first beat duration is always less than the reference beat duration. Based on the difference between the durations of the first beat and second beat, Roskamm and Csapo (7), classified heartbeats into four types: compensation, reset, interpolation, and re-entry. Based on their analysis, these four types are defined as follows: • Compensation: the second beat is prolonged, and the sum of the first beat duration and second beat duration is equal to the duration of two reference beats. • Reset: the second beat is prolonged, but the sum of the first beat duration and second beat duration is less than the duration of two reference beats. • Interpolation: the sum of the first beat duration and second beat duration is equal to one reference beat duration. • Re-entry: the sum of the first beat duration and second beat duration is less than one reference beat duration. We could not find a template, within the four databases mentioned above, that satisfies the definition of re-entry. Therefore, the re-entry is not included in the current analysis.
A previous attempt (8) on the use of heartbeat classification using ECG signals inspired the classification of heartbeats in PPG signals. Based on the previous heartbeat classification (8), Figure 1A shows the regular heartbeats where the first beat, second beat, and third beat have equal durations (e.g., d 0 = d 1 = d 2 = 1, 000 ms). On the other hand, Figure 1B shows the compensation phase, the second beat (e.g., d 1 = 850 ms) is followed by a prolonged beat (e.g., d 2 = 1150 ms) to compensate the two beats duration of 2,000 ms. During the reset phase ( Figure 1C), the second beat (e.g., d 1 = 650 ms) is followed by a prolonged beat (e.g., 1,150 ms), while in the interpolation ( Figure 1D), the second beat (e.g., d 1 = 400 ms) is followed by an irregular beat (e.g., d 2 = 600 ms).

METHODOLOGY
The PPGSynth consists of three main parts: the model of a single PPG pulse, the pulse duration generator, and the noise generator.

Model of Single PPG Pulse
The single PPG pulse step is based on a recently published model (9) that simulates fingertip PPG waveforms. Note that the adopted model (9) is an early work on healthy subjects; however, this paper is about arrhythmic PPG beats relating to cardiovascular patient simulated recordings, which is definitely a new concept. The construction of a PPG waveform is regarded as a motion trajectory in the three-dimensional space established by the coordinate system (x, y, z). As shown in Figure 2, the periodicity of PPG is represented by a circular motion.
The motion trajectory in the (x, y) plane is the unit circle. One cycle of movement on the circle corresponds to a peak-topeak interval or heartbeat. The trajectory in the z direction is the PPG signal. The systolic wave and diastolic wave are simulated in Gaussian functions. The equation of (x, y, z) is defined as follows: where t is time, ω is the angular velocity (which is used to control the pulse duration), t 0 is the end time of the previous beat,π is used to align the initial point of this model to the position of the onset in a PPG waveform, and a i , θ i , and b i are the amplitude of the peak, the position of the center of the peak, and the standard deviation of Gaussian functions, respectively. Additionally, ω is calculated by: where T is the PPG pulse duration. θ is the four-quadrant inverse tangent of (x, y), which is introduced as an independent variable for motion in the z direction and is defined as: with the changes to (x, y), θ is in the range of (−π, π).
The corresponding changes to x, y, and z over a single period are shown in Figure 2; these are repeated in the next pulses. In this figure, the pulse duration was 1 s, and the sampling frequency was 125 Hz. Obtaining a waveform of the synthetic PPG pulse that is close as possible to the real PPG pulse through calculation of model parameters is an optimization problem (finding the optimal parameters). The objective function was expressed as follows: (z p (n) − s(n)) 2 + (1 − corr(z p (n), s(n)))), (4) where z p (n) is the synthetic PPG, l is the length of the real PPG s(n), and corr is Pearson's linear correlation coefficient.
In this study, the interior-point (10) method was used to solve the optimization problem.

Variability of Parameters
In real-world PPG, the waveform often varies between pulsessometimes dramatically so. To make the synthesized PPG closer to a real PPG, we used a Gaussian distribution to generate random parameters for our model. In this paper, the mean value and standard deviation of each parameter's Gaussian distribution are derived from real PPG signals, set by modeling a PPG pulse (from start of a pulse to the start of the consecutive pulse). For a PPG trace which has regular beats; we modeled all pulses in a 5-min PPG from the MIMIC database (3). However, for a PPG trace which has irregular beats, the waveform of the first beat and second beat in the premature group is distinct from the reference beat. We model three compensation segments (include the first beat and second beat) from one record of the Queensland database (4) to get the distribution of parameters in the compensation type. For reset, four reset segments from one record of the MIMIC database are used to get the distribution. For interpolation, three interpolation segments in one record from the Queensland database are used to calculate the parameters. The mean and standard deviation of these parameters are shown in Table 1.  Note that since we do not have a high-quality re-entry PPG in our database, this toolbox does not support generating reentry PPGs. To not change the irregular category, the duration ratio of irregular beat and regular beat in synthetic PPG uses a fixed value instead of a random number obeying Gaussian distribution. These fixed values are the mean of the ratio of the pulse duration of the irregular beat and regular beat in Table 1.

Pulse Duration
In this study, the PPG pulse duration is defined as the valleyto-valley interval. To generate a sequence of PPGs, a series of PPG pulse durations were needed. In this toolbox, reference pulse durations are generated based on the basic heart rate and signal time lengths, and then the reference pulse durations are randomly replaced by two consecutive irregular beats. The ratios of the first beat duration and second beat duration to the reference beat duration are calculated from each type of PPG templates, and the results are shown in Table 1.

Adding Noise
Two types of noise are available in this toolbox: white Gaussian noise and multi-frequency noise. Multi-frequency noise is a set of noises that have different amplitudes and frequencies. Each noise is generated as follows: where A is the amplitude of the peak of the noise and f is the frequency of the noise. If necessary, users can add one or more different amplitudes and frequency noises to the clean synthetic PPG signals.

CUBIC INTERPOLATION
The variability of parameters will make the endpoint value of one beat differ from the next beat's onset value. In this paper, cubic interpolation was used to smooth the synthetic PPG. Cubic spline interpolation involves a spline where each piece is a third-degree polynomial specified by its values and first derivatives at the corresponding domain interval's endpoints. The interpolation  Table 1.
involved a total of 0.2 s around the onset. The previous sampling points of 0.05 s and the last sampling points of 0.05 s were used to fit the interpolation function. The middle 0.1-s samples' value is replaced by the corresponding samples' value generated by the interpolation function. Figure 3 shows the main dialogue of the GUI. The first step is to select the type of synthetic PPG using the drop-down button in the upper left corner. Available options are regular or three types of irregular PPG. Then we can modify the sampling frequency and signal length in the "Basic Info" panel. Once we change any data, the GUI will attempt to generate the synthetic PPG and show it at the bottom of the dialogue. By pressing the "Edit" button, users can modify the parameters of pulses in the pop-up dialogue. For a regular PPG, this toolbox uses the same parameters for different pulses. But for irregular PPGs, parameters are different in the first beat of the premature group, the second beat of the premature group, and the reference beat. Users can also modify the ratio of first beat duration and second beat duration to reference beat duration. The default value is shown in Table 1. Once done with editing pulse parameters, press the "OK" button to save these parameters and go to the main dialogue.

THE GRAPHICAL USER INTERFACE
After setting the basic info, users should set some parameters to generate the pulse duration. For a regular PPG, users can modify the mean heart rate and standard deviation of the RR intervals in the "Pulse Duration Info" panel. For irregular PPG types, this panel changes to an "Irregular Duration Info" panel, where users can modify the basic heart rate and irregular times of the synthetic PPG. The basic heart rate and mean heart rate are in the range of 50 to 180. A warning dialogue pops up when the "Irregular Times" value is too large or too small relative the signal length. In this case, users should either decrease the irregular times or increase the length of the signals.
If necessary, users can add noise to synthetic PPG. Two types of noise are available in the "Noise Info" panel: White Gaussian noise and multi-frequency noise. For white Gaussian noise, users can modify the signal-to-noise ratio (SNR). A 5-s regular PPG with white Gaussian noise is shown in Figure 4A. Additionally, for multi-frequency noise, see as Figures 4B-D. "Amplitude" is the amplitude of the peak of the noise signals, and "Frequency" describes the noise frequencies. The number of values in "Amplitude" and "Frequency" should be the same.
After synthesizing signals, users can press the "Save" button to save the synthetic PPG to comma-separated values file (.csv), Microsoft Excel file (.xlsx), and MAT-file (.mat).

LIMITATIONS OF STUDY AND FUTURE WORK
Generating regular PPG signal using certain parameters is reproducible. If we add noise, the PPG signal cannot be reproduced as the noise addition is carried out randomly. On the other hand, generating irregular PPG signals is non-reproducible because the duration of each beat is randomly getting set. Adding noise to the generated irregular PPG signal makes it highly non-reproducible. The next step is to generate re-entry irregular heartbeats in PPG signals, and potentially other types of abnormalities to the toolbox. The main focus of the current study was not on detecting events in irregular PPG signals with irregular heartbeats; rather, the focus was on generating irregularity in PPG signals. Another aspect of future development is to generate PPG signals with certain hemodynamic parameters (e.g., blood pressure levels) simulating the PPG templates and their associated hemodynamics parameters. This toolbox is released as version 1 (PPGSynth v1.0, August 11, 2020) and the more templates we include the more the toolbox will be more able to generate PPG waveforms covering different irregularities (simulated cardiovascular patient groups) and noise types. One of the next steps is to generate normotensive and hypertensive PPG signals.

SUMMARY
PPGSynth, a new publicly available toolbox, is described as a means to generate synthetic PPG waveforms. Users can easily generate a waveform across a range of sampling frequencies and can also set the length of regular and irregular PPGs. The utility can also generate specific shapes of PPGs by modifying the pulse parameter settings. These characteristics make the new toolbox useful for less experienced users that would like to generate synthetic PPGs for their research and training in physiological measurements.