Prospective clinical validation of the Empatica EmbracePlus wristband as a reflective pulse oximeter

Introduction Respiratory diseases such as chronic obstructive pulmonary disease, obstructive sleep apnea syndrome, and COVID-19 may cause a decrease in arterial oxygen saturation (SaO2). The continuous monitoring of oxygen levels may be beneficial for the early detection of hypoxemia and timely intervention. Wearable non-invasive pulse oximetry devices measuring peripheral oxygen saturation (SpO2) have been garnering increasing popularity. However, there is still a strong need for extended and robust clinical validation of such devices, especially to address topical concerns about disparities in performances across racial groups. This prospective clinical validation aimed to assess the accuracy of the reflective pulse oximeter function of the EmbracePlus wristband during a controlled hypoxia study in accordance with the ISO 80601-2-61:2017 standard and the Food & Drug Administration (FDA) guidance. Methods Healthy adult participants were recruited in a controlled desaturation protocol to reproduce mild, moderate, and severe hypoxic conditions with SaO2 ranging from 100% to 70% (ClinicalTrials.gov registration #NCT04964609). The SpO2 level was estimated with an EmbracePlus device placed on the participant's wrist and the reference SaO2 was obtained from blood samples analyzed with a multiwavelength co-oximeter. Results The controlled hypoxia study yielded 373 conclusive measurements on 15 subjects, including 30% of participants with dark skin pigmentation (V–VI on the Fitzpatrick scale). The accuracy root mean square (Arms) error was found to be 2.4%, within the 3.5% limit recommended by the FDA. A strong positive correlation between the wristband SpO2 and the reference SaO2 was observed (r = 0.96, P < 0.001), and a good concordance was found with Bland–Altman analysis (bias, 0.05%; standard deviation, 1.66; lower limit, −4.7%; and upper limit, 4.8%). Moreover, acceptable accuracy was observed when stratifying data points by skin pigmentation (Arms 2.2% in Fitzpatrick V–VI, 2.5% in Fitzpatrick I-IV), and sex (Arms 1.9% in females, and 2.9% in males). Discussion This study demonstrates that the EmbracePlus wristband could be used to assess SpO2 with clinically acceptable accuracy under no-motion and high perfusion conditions for individuals of different ethnicities across the claimed range. This study paves the way for further accuracy evaluations on unhealthy subjects and during prolonged use in ambulatory settings.

Introduction: Respiratory diseases such as chronic obstructive pulmonary disease, obstructive sleep apnea syndrome, and COVID-19 may cause a decrease in arterial oxygen saturation (SaO 2 ).The continuous monitoring of oxygen levels may be beneficial for the early detection of hypoxemia and timely intervention.Wearable non-invasive pulse oximetry devices measuring peripheral oxygen saturation (SpO 2 ) have been garnering increasing popularity.However, there is still a strong need for extended and robust clinical validation of such devices, especially to address topical concerns about disparities in performances across racial groups.This prospective clinical validation aimed to assess the accuracy of the reflective pulse oximeter function of the EmbracePlus wristband during a controlled hypoxia study in accordance with the ISO 80601-2-61:2017 standard and the Food & Drug Administration (FDA) guidance.Methods: Healthy adult participants were recruited in a controlled desaturation protocol to reproduce mild, moderate, and severe hypoxic conditions with SaO 2 ranging from 100% to 70% (ClinicalTrials.govregistration #NCT04964609).The SpO 2 level was estimated with an EmbracePlus device placed on the participant's wrist and the reference SaO 2 was obtained from blood samples analyzed with a multiwavelength co-oximeter.Results: The controlled hypoxia study yielded 373 conclusive measurements on 15 subjects, including 30% of participants with dark skin pigmentation (V-VI on the Fitzpatrick scale).The accuracy root mean square (A rms ) error was found to be 2.4%, within the 3.5% limit recommended by the FDA.A strong positive correlation between the wristband SpO 2 and the reference SaO 2 was observed (r = 0.96, P < 0.001), and a good concordance was found with Bland-Altman analysis (bias, 0.05%; standard deviation, 1.66; lower limit, −4.7%; and upper limit, 4.8%).Moreover, acceptable accuracy was observed when stratifying data points by skin pigmentation (A rms 2.2% in Fitzpatrick V-VI, 2.5% in Fitzpatrick I-IV), and sex (A rms 1.9% in females, and 2.9% in males).Discussion: This study demonstrates that the EmbracePlus wristband could be used to assess SpO 2 with clinically acceptable accuracy under no-motion and high perfusion conditions for individuals of different ethnicities across the claimed range.This study paves the way for further accuracy evaluations on unhealthy subjects and during prolonged use in ambulatory settings.

Introduction
Respiratory diseases like chronic obstructive pulmonary disease (COPD), asthma, obstructive sleep apnea syndrome (OSA), and COVID-19 account for a significant burden of disease (1,2).COPD, associated with persistent and progressive respiratory symptoms (3), has a global prevalence of 3.9%, and, in 2019, was the third leading cause of death worldwide (1,3).Asthma is caused by inflammation and narrowing of the small airways in the lungs (4) and affects approximately one in five individuals (5).In the U.S., 26% of individuals between 30 and 70 years suffer from OSA, which manifests in complete or partial upper airway obstruction (6).Over the past three years, clinical COVID-19 associated with the CoV-2 coronavirus has also resulted in substantial morbidity and mortality, causing over 6 million deaths worldwide to date (7).The common pathophysiology underlying these respiratory diseases is impaired gas exchange, which depending on the severity and duration of impairment, can result in hypoxemia (1-8) and associated clinical signs and symptoms ranging from headaches and dyspnea to cellular hypoxia, organ failure, and death in extreme cases (9,10).
Since arterial oxygen saturation (SaO 2 ) below the physiological 95%-100% range can be suggestive of respiratory pathology but is not always associated with apparent symptoms such as dyspnea, monitoring of blood oxygenation in individuals at risk of new or worsening hypoxemia is an important tool to enable early identification of clinical decompensation, and to inform timely decision-making about interventions including hospitalization, ICU admission, and supplemental oxygen therapies (e.g., mechanical ventilation) (10).The clinical standard and most accurate method for assessing SaO 2 is co-oximetry, which requires invasive measurements to detect SaO 2 values from blood samples.Co-oximetry allows only intermittent and point-in-time measurements of SaO 2 , which are not compatible with continuous monitoring in ambulatory settings and are thus not appropriate for analysis of longitudinal trends in the SaO 2 (9,11,12).
Pulse oximetry is a non-invasive modality for monitoring SaO 2 estimations by measuring the peripheral oxygen saturation (SpO 2 ) using photoplethysmography (PPG) technology.PPG-based fingertip pulse oximeters have become the de facto standard for assessing SpO 2 in hospitalized patients due to their noninvasiveness, flexibility (13,14), and ability to meet clinical accuracy thresholds established by the ISO 80601-2-61:2017 standard and the US Food & Drug Administration (FDA) guidance on pulse oximetry validation (15, 16), which were established in 2017 and in 2013, respectively.Recently, PPGbased sensors have been increasingly adopted also in mobile digital health technologies for remote monitoring applications, as part of the modern paradigm shift of clinical care and clinical research to home-centered healthcare and decentralized clinical trials (11,13,14,(17)(18)(19).Among various physiological parameters, SpO 2 measurements in ambulatory settings are now provided by several wireless commercial devices, with different degrees of validation (20) and different form factors, including devices worn on the upper arm (19,21), on the chest (19,22), on the ear (23), or on the wrist (11,18,19,(24)(25)(26)(27)(28)(29)(30)(31).
Among wearables, wrist-worn devices have several advantages thanks to their portability, comfort, easy acceptance, and nonstigmatization, resulting in high compliance (32).However, getting SpO 2 estimations from the wrist is challenging given lower arterial blood perfusion and lower signal-to-noise ratio resulting from multiple tissue scattering in the dorsal wrist compared to other locations (e.g., finger, ear) (14, 33,34).The number of wrist-worn devices that received clearance from the FDA for PPG-based technologies remains quite low (27)(28)(29)(30)(31).These wrist-worn devices have been cleared based on clinical evidence abiding by the FDA/ISO standardized protocols which require a minimum of 200 data points each for 10 or more healthy volunteers that vary in age, sex, and skin pigmentation, including 2 darkly pigmented subjects or 15% of study pool, whichever is greater (15, 16).To date, however, details of validation studies supporting clearance of these devices have not been widely published (26,35).
Recently, the publication of post-market data related to the real-world use of cleared pulse oximeters has raised questions about the performance of these devices, particularly in individuals with darker skin tones (36), leading to increased scrutiny by the FDA and the scientific and clinical community (37,38).This issue was documented based on data from finger pulse oximeters using transmissive technology, which has a superior signal-to-noise ratio compared to reflective sensors.Thus, a higher impact of skin color could be expected on devices with reflective PPG technology, since scattering due to melanin would further increase the scattering seen at baseline in wristworn devices (39).
Given the paucity of published validation studies for clinical data from wrist-worn wearables, and recently raised concerns that existing pulse oximeters may not perform accurately in racial minorities (36)(37)(38)(39), it is critical to rigorously test new wrist-worn pulse oximeters, especially to verify their accuracy and reliability across different skin pigmentations.This work contributes to filling this gap by reporting the prospective validation of the SpO 2 measurements computed by a wrist-worn medical device, i.e., the EmbracePlus wristband, on a pool of subjects enriched for individuals with darker skin pigmentation (30% of study participants).This device and its associated monitoring platform recently received clearance from the FDA to allow healthcare professionals to monitor SpO 2 in no-motion and high perfusion conditions in ambulatory individuals aged 18 years and older in home healthcare settings (40).This work details the methods and the results of the comparison between EmbracePlus SpO 2 measurements and gold standard measures of SaO 2 , performed during a controlled hypoxia study following the ISO 80601-2-61:2017 standard (15) and the FDA guidance (16).The validation presented herein complies with recently published suggestions to increase the statistical robustness of results, transparency, and understanding of limitations of pulse oximetry.Moreover, this study reports individual subjects accuracy levels, subgroup analysis by different skin pigmentation on a dataset that doubles the representation of darkly pigmented individuals

PPG principles
The SpO 2 algorithm is based on PPG technology and harnesses two principles: (i) the different absorption spectra of the oxygenated (HbO 2 ) and the deoxygenated (HbH) hemoglobin, and (ii) the presence of a pulsatile arterial blood flow (42,43).The former is exploited to compute the concentrations of HbO 2 and HbH by illuminating the skin with two different wavelengths, typically red (∼660 nm) and infrared (∼940 nm).Indeed, HbO 2 absorbs more infrared light while the red light more easily passes through, whereas HbH absorbs more red light and allows more infrared light to pass through (42)(43)(44).The relative amount of red and infrared light that is reflected towards a photodetector (PD) after being partially absorbed by the arterial blood can be used to ultimately estimate the proportion of the hemoglobin bound to oxygen and therefore the SpO 2 level, based on the Lambert-Beer law of absorbance (45).The second principle leverages the respective inherent contractility of arteries and veins.During each cardiac cycle, the arterial blood volume increases during systole and decreases during diastole, leading to fluctuations in the absorbed red and infrared light, which form the pulsatile (AC) component.By contrast, light that is reflected and reaches the PD from blood volume in veins and capillaries or non-vascular tissues presents a relatively stable, non-pulsatile component, which forms the steady (DC) component (33,44).The perfusion ratio (R) (i.e., the ratio of AC/DC of red and infrared PPG) is used to derive the SpO 2estimation using an empirical calibration function derived from the relationship between R and SpO 2 , which is obtained in experimental conditions through a stable and controlled hypoxic condition spanning the claimed measurement range (e.g., 70%-100% SpO 2 ) (Figure 1) (44,46).

EmbracePlus SpO 2 algorithm
The inputs of the SpO 2 algorithm are data obtained illuminating the skin with green, red, and infrared-light PPG sensors and data recorded by three-axis accelerometry (ACM) sensors embedded in the EmbracePlus device (Figure 2).The algorithm analyzes the red and infrared PPG signals to extract the amplitude of the AC pulsatile component and of the DC baseline component (44), using the green PPG to support the detection of AC.Using the Lambert-Beer model (47), these metrics can be used to estimate the SpO 2 value using a calibration model previously obtained during the training phase, which harnessed data from a controlled hypoxia calibration study and from real-life data.The data used to develop the algorithm were completely independent from the datasets used for the prospective clinical validation presented in this work, i.e., the development and validation datasets did not contain data from the same subjects.
Each value of SpO 2 is estimated on a 10 s rolling window offset by 1 s.A missing SpO 2 value indicates that the algorithm does not have enough confidence to compute an output.A low level of confidence is principally driven by the user not wearing or improperly wearing the device, or by the presence of low-quality raw sensor data (e.g., during motion conditions or low perfusion).Indeed, reflective PPG positioned on the dorsal portion of the wrist has been associated with a lower signal-tonoise ratio compared to the transmissive PPG sensors (26,48,49).This effect might be due to both the sensor placement, since only a small portion of the light is reflected and reaches the PD, and to the sensor design accounting for multiple scattering through the skin layers and movement artifact contamination associated with the probe contact pressure (49).In addition, blood perfusion is lower on the back of the wrist as compared to the finger (26).Thus, since SpO 2 measured at the wrist can be prone to lower signal quality and is sensitive to movement contamination, the EmbracePlus utilizes an automated data rejection mechanism (i.e., quality index) based on PPG and ACM to detect a signal quality threshold for which it is possible to estimate trustable SpO 2 , discarding inconclusive SpO 2 measurements in the condition of low perfusion, movement, and low signal quality (34,50).
The algorithm involves signal processing steps in time-domain (e.g., linear filtering, cross-correlation) and frequency-domain (e.g., No "future-time" data point is used to estimate SpO 2 in a given 10 s window.

Recruitment
The protocol received IRB approval (Laurel Heights Committee -approval number 10-00437; Clinicaltrials.govregistration #NCT04964609) to test the accuracy of the EmbracePlus SpO 2 measurements during mild, moderate, and severe hypoxia.A single-center and interventional clinical study was conducted on 16 healthy participants in a laboratory at the University of California San Francisco between June 2021 and January 2023.Healthy male and female subjects between the ages of 18 and 55 years were recruited, excluding current smokers, women who were pregnant, lactating, or trying to get pregnant, and participants with obesity (body mass index, BMI > 30 kg/m 2 ) or who had an injury, deformity, tattoos, or other physical abnormality at the sensor sites.Exclusion criteria also included participants with serious systemic illnesses, and those who use continuous positive airway pressure, have unacceptable collateral circulation, or any other condition which in the investigators' opinion would make them unsuitable for the study.
Participants were primarily selected to represent a heterogeneous population in terms of skin tone which was assessed by the Clinical Coordinator at Hypoxia Lab based on the Fitzpatrick scale for skin pigmentation assessment, recruiting 30% of participants with skin tone classified as Fitzpatrick V or VI, i.e., doubling FDA requirements for inclusion of individuals with dark skin (16).Moreover, the recruitment process aimed at including a participant pool with varying ages, BMI, and sex balance.The sample size was selected according to ISO 80601-2-61:2017, which recommends including at least 200 data points from at least 10 subjects (15).

Study design
The test was conducted in accordance with ISO standards (15) and FDA guidance ( 16) for SpO 2 testing, which require evaluation against a SaO 2 reference measurement ranging from 70% to 100% during a controlled desaturation protocol.
Each participant was placed in a comfortable semi-recumbent position for approximately 45 min and asked to remain still while breathing a mixture of gas through a mouthpiece while supervised by a medical monitor.Participants' hands and arms were maintained at ambient room temperature during data collection.Data were collected from an EmbracePlus wristband and two FDA-cleared finger-tip pulse oximeters (Masimo Rad-5 by Masimo Corp and Nellcor N-595 by Nellcor Puritan Bennett Inc.) which were used to monitor the hand perfusion and synchronize the reference SaO 2 data with the EmbracePlus data.All the devices were positioned according to their respective instructions for use.The non-dominant radial artery of each participant was connected to an arterial line (a 22-gauge catheter) to draw blood samples on which the SaO 2 was analyzed by a laboratory multiwavelength cooximeter (ABL-90 blood gas analyzer, Radiometer Medical ApS) and used as ground truth for the SpO 2 algorithm evaluation.Then, each participant underwent two desaturation runs consisting of a stepwise decrease of the oxygen concentration in the inspired gas mixture, as illustrated in Figure 3.
At the beginning of each ramp, a baseline blood sample was collected at room air.Approximately 10 s later, the inspired oxygen was progressively reduced to reach the next SpO 2 plateau level, identified by a stable level of oxygen saturation between the reference finger pulse oximeters.At each target SpO 2 plateau, two to four blood samples were collected approximately 30 s apart, and within 10 s from the conclusion of each plateau, inspired oxygen was progressively changed again to reach the next SpO 2 target level.The target SpO 2 at each run was chosen to allow an even sampling within the [70%-100%] SaO 2 range.Every participant underwent a maximum of 6 plateaus in each ramp before being exposed to a high oxygen saturation level (100% O 2 ) by breathing oxygen-enriched air for 2 min.The collected blood samples (approximately 20-26 samples per participant) were immediately analyzed with the co-oximeter to measure the reference SaO 2 .

Data handling
Prior to the analysis, periods corresponding to EmbracePlus recording failure (i.e., missing raw data) and out-of-range values from the reference SaO 2 measurement device (i.e., SaO 2 < 67%) were identified and removed.EmbracePlus and finger pulse oximeter data were aligned using the pulse rate series estimated on EmbracePlus PPG and the pulse rate series logged by the finger pulse oximeters.The procedure was blind, and it was performed without the knowledge of SaO 2 values.However, since the timestamps of the reference SaO 2 were already synchronized with the finger pulse oximeter data, this procedure allowed automatic alignment with the SpO 2 algorithm outputs and the reference SaO 2 .
EmbracePlus continuous SpO 2 data were analyzed to select the values associated with each SaO 2 reference reading.The median SpO 2 value inside a 10 s window within the plateau associated with each blood sample was computed and used to determine the paired value for performance evaluation.The window of SpO 2 values was adjusted to fit a segment with good-quality EmbracePlus data within the selected plateau, to discard the effect of involuntary movements occurring during the procedure (e.g., when the sample was taken).

Statistical analysis
The primary endpoint of both studies was the accuracy root mean square (A rms ), which is a combination of the systematic and random components of error, computed as the root-meansquare differences between the algorithm output (SpO 2i ) and the reference (SaO 2i ) (Equation 1), where N is the total number of data points.Data from all the subjects were pooled together for A rms computation to verify the primary effectiveness endpoint.Moreover, individual A rms values were computed on each subject's data.Only the data pairs with evaluable values for both the EmbracePlus and the reference device were used.
According to FDA guidelines ( 16), a passing result required an A rms ≤ 3.5% across the fully tested range under no-motion conditions, computed pooling the data points collected from all subjects.This threshold specifies that approximately two-thirds of the device measurements fall within ±3.5% of the reference measurement.
As additional performance measures, the mean bias (i.e., the average of the difference between SpO 2 and SaO 2 ) and the mean absolute error (MAE) (i.e., the average of the absolute difference between SpO 2 and SaO 2 ) were computed.A Bland-Altman analysis was performed on all the data points by plotting SaO 2 versus error (SpO 2 -SaO 2 ) with linear regression fit and upper 95% and lower 95% limits of agreement (LoAs) corrected for repeated measurements, indicating the error boundary where approximately 95% of data points fall (51).Additionally, a correlation analysis with linear regression fitting was performed on the pooled data to evaluate the correlation between the EmbracePlus SpO 2 and the reference SaO 2 values.Performance metrics were also evaluated on sex and skin-tone subgroups separately, namely on female and male subjects and on individuals with dark (Fitzpatrick class V and VI) and light skin pigmentation (Fitzpatrick classes I to IV).Furthermore, to investigate possible differences in the SpO 2 estimation error between successive desaturation ramps, a mixed effect model was performed with the subject ID as a random effect and the desaturation ramp ID as fixed effect.An ANOVA was then performed to test the hypothesis that the coefficient representing the fixed-effect term is 0 (F-test with significance level at 0.05).

Results
The 16 participants in this study included 8 men and 8 women aged 18-43 years with various skin tones.Table 1 and Table 2 report the demographic summary and listing of the participants, respectively.No undesirable effects or adverse events were reported during the study.
Data from all subjects were included in the analysis, for a total of 398 samples of paired reference SaO 2 and EmbracePlus SpO 2 measurements.The recorded data did not include any missing EmbracePlus data or data affected by sensor issues.Out of the 398 blood samples, 1 sample from subject #6 could not be used for performance computation due to a missing timestamp for the Graphical representation of each desaturation run, consisting of a stepwise decrease of blood oxygen through stable plateaus during which a minimum of two blood samples (red circles) were taken to measure reference SaO 2 .The reference SaO 2 showed a median value of 86.9% on the analyzed data, ranging from 68.1% to 100%.The paired measurements by the SpO 2 algorithm showed a median value of 87% and ranged from 67% to 100%.The pooled A rms was 2.4%, the bias 0.05%, the MAE 1.82%, and the upper and lower 95% LoAs were 4.8% and −4.7%, respectively (Table 3).Three levels of oxygen saturation were analyzed (i.e., SaO 2 < 80%, 80% ≤ SaO 2 < 90%, and SaO 2 ≥ 90%), on which pooled A rms of 3.2%, 1.9%, and 2.2% and pooled bias values of 1.6%, 0.3%, and −1.2% were observed, respectively.In each SaO 2 decile, pooled MAE was equal to 2.59%, 1.43%, and 1.68% and upper and lower 95% LoAs were ranging from −4.9% to 7.3% (Table 3).Additionally, Table 4 reports the individual A rms , bias, and MAE on the full SaO 2 range together with the number of samples for each subject, which ranged from 20 to 26.Thirteen subjects (87%) demonstrated an A rms lower than 3.5%.
In Figure 5, the Bland-Altman plot for all the analyzed subjects illustrates the difference between the EmbracePlus SpO 2 and the reference SaO 2 with respect to the reference SaO 2 values.A total of 26 outliers outside the pooled LoAs (i.e., −4.7%; 4.8%) were identified, representing ∼7% of total data points, and are listed in Table 5. Figure 6 reports the regression plot on the pooled data, illustrating a positive, strong correlation (Pearson's correlation coefficient of 0.96) between the SpO 2 algorithm and the reference SaO 2 .In addition, Figure 7 shows the distribution of the difference between the EmbracePlus SpO 2 and the reference SaO 2 measurements with a normal density function fitting.
Table 6 reports the SpO 2 algorithm performance considering skin pigmentation and gender subgroups, as listed in the Statistical Analysis section.Each subgroup analyzed meets or exceeds the FDA-recommended clinical threshold (A rms equal or lower than 3.5%).Moreover, comparable LoAs and error metrics were observed between skin and gender subgroups.
Finally, a non-significant effect of the desaturation ramp on the estimation errors was determined (p > 0.05).

Principal results
Wrist-worn wearable devices are increasingly used to monitor SpO 2 and contribute to promoting home-centered healthcare and decentralized clinical trials (11,18,26,32,35,52), with the potential to significantly improve clinical outcomes by supporting prompt diagnosis and early detection of clinical deterioration in patients with respiratory diseases such as COPD, OSA, and COVID-19 (9)(10)(11)(12).However, commercially available wearable  devices using PPG to measure SpO 2 are often sold as consumer products rather than medical devices, thereby reducing requirements for evidence of clinical performance and independent review by relevant regulatory authorities (11,19,52,53).Even when cleared as medical devices, concerns have recently emerged about racial bias impacting the performance of pulse oximeters in individuals with darker skin pigmentation (37, 38), highlighting the need for additional data evaluating the reliability of wearable PPG technology.
In this work, we examined the validity of the EmbracePlus wristband reflective pulse oximeter in monitoring SpO 2 during a controlled hypoxia study.The clinical study was conducted in accordance with the ISO 80601-2-61:2017 standard (15) and FDA guidance (16).Data from sixteen healthy adults were analyzed during a standardized desaturation protocol, and performances of the SpO 2 EmbracePlus measurements were compared with the gold standard SaO 2 .In this study, we report performance on 373 paired samples from 15 participants, exceeding FDA guidelines on validation of pulse oximeters, which An example of SpO 2 measurements during the two desaturation ramps: in black, the 1 s EmbracePlus SpO 2 outputs; in red, the SaO 2 samples; in grey, the clinical acceptance boundaries of ±3.5% centered on the EmbracePlus SpO 2 outputs; #bs, number of the blood sample corresponding to each SaO 2 measure.Recently, a positive bias and a larger error for SpO 2 measurements at low blood oxygen saturations in darkly pigmented subjects have been reported by manufacturers of several different pulse oximeters (36,41), raising concern for potential racial bias (37,38).Given these concerns, this study enrolled five (i.e., 5/16, ∼31%) subjects with skin pigmentation classified as Fitzpatrick scale V or VI, doubling the FDA requirements of having at least 2 subjects or 15% of the participants (whichever is greater) with dark skin tone (16).The results highlight a global error of 0.1% and 0.04% when considering dark and light skin tones, respectively, with a slightly larger positive bias for darkly pigmented participants but comparable LoAs.Each subgroup analyzed, moreover, was found  to have an A rms below 3.5% (i.e., Fitzpatrick V-VI, Fitzpatrick I-IV, males, females), indicating acceptable thresholds for accuracy.An analysis of outliers found a limited number of data points falling outside the pooled Limits of Agreements (i.e., −4.7%; 4.8%), representing <7% of the total.In a risk analysis, the clinical impact of these outliers was determined to be marginal, as no data points fell within a window of occult hypoxemia, defined as a SaO 2 of <88% with a measured SpO 2 of 92%-96% (36,54).For the majority of outliers (17/26, ∼65%), the wearable device produced SpO 2 overestimates of SaO 2 for SaO 2 values ≤84.1%.In each instance, the degree of SpO 2 overestimation would not change the clinical interpretation of the subject as being critically hypoxic.Nearly all remaining outliers (7, i.e., ∼27%) involved underestimating SaO 2 values over 95%.These underestimated values remained within the highest SpO 2 decile, and the bias towards underestimation Regression plot for comparing the EmbracePlus SpO 2 values against the reference SaO 2 values (N = 373 samples from n = 15 subjects).Single subject reference SaO 2 data are plotted against the SpO 2 , with a superimposed bisect line (dashed black line) and a linear regression line (solid red line).In the legend, the slope and the intercept, Pearson's correlation coefficient (rho), its P value, and the R2 value of the linear fitting are reported.

FIGURE 7
The distribution of the difference between the EmbracePlus SpO 2 and the reference SaO 2 (grey) with a normal density function fit (dashed black line).The presented results contributed to receiving FDA clearance for the EmbracePlus and its monitoring platform to be used by trained healthcare professionals or researchers to remotely monitor physiological parameters in ambulatory individuals 18 years of age and older in home-healthcare environments (40).The Empatica EmbracePlus wristband is one of the few wearable devices that received 510(k) marketing authorization from the US FDA for SpO 2 monitoring (21, 22, 27, 28).An analysis of performance data for FDA-cleared wrist-worn SpO 2 monitoring devices showed comparable levels of overall accuracy, but significant limitations in publicly available data.The Oxitone 1000 M was validated on data recorded during a desaturation protocol similar to that used to test the EmbracePlus algorithm, and the authors reported an A rms of 1.9% (27,35).Given fewer total data points included in the analysis (240 samples from 10 subjects) and a paucity of published data on individuals with darker skin pigmentation (18,27,35), the impact of racial bias on performance for this device remains unclear.Similarly, a recent study on 14 subjects wearing the ScanWatch during a similar desaturation protocol of this study showed an accuracy A rms of 2.97% (right wrist) and 3% (left wrist) on a comparable population but did not include subgroup analyses evaluating the impact of sex or skin tone on performance (26,28).Finally, Biobeat Technologies Ltd reports an A rms of 2% for their wearable devices; however, to the best of our knowledge, no complete information about the population, number of samples, or performance by subgroups is available (22).

Limitations and future work
The validation presented in this work advances recent efforts from research and public health communities to increase transparency and understanding of the limitations of pulse oximetry.Recommendations that have been put forth to date include publishing subgroup analyses, providing justification for outliers, and increasing the number of data samples analyzed beyond the FDA-suggested sample size (41).Nevertheless, the results of this study must be interpreted in the context of several limitations.
During the hypoxia study, data were collected in a controlled environment with a standardized desaturation protocol to maintain SpO 2 levels as stable as possible.While the controlled study design fulfills the ISO and FDA guidelines and allows for better assessment of the impact of certain confounders (e.g., demographic variables) on performance through careful experimental control of other covariates, the results do not support analysis of the generalizability of the EmbracePlus wristband to monitor SpO 2 in real-world conditions, where SpO 2 exhibits dynamic changes over time.
Additionally, the study pool was limited to healthy subjects aged 18-43 years, due to higher risks induced by hypoxia in older and/or unhealthy patients.There are an increasing number of investigations evaluating the performance of wrist-worn devices in uncontrolled studies and/or on patients with cardiovascular and lung diseases, yet these are mostly performed with consumer smartwatches thanks to their wider availability and lower costs (11,19,32,52).As there are not rigorous evaluation requirements for SpO 2 computation in these consumer products, and they do not go through independent regulatory review, however, it is unclear to what degree clinical study results using these products can be relied upon to understand the impact of race, comorbidities, or real-world conditions on medical device performance (25).
Following promising preliminary data collected during a separate, ambulatory, uncontrolled desaturations in darkly pigmented adults (Figure 8), future work will focus on extensively testing the EmbracePlus wristband in real-life conditions (e.g., sleep), in diverse age ranges, and on specific pathological conditions (e.g., lung diseases like COPD and obstructive sleep apnea).Future investigations may assess the feasibility of using the SpO 2 data not only to detect hypoxemia but also to assess cardiopulmonary function [e.g., track SpO 2 recovery after exercise (55)] or for stress monitoring (56).

Conclusions
To conclude, the study results demonstrate that the SpO 2 measurements performed by the non-invasive EmbracePlus Results are derived from the data pooled across dark skin pigmentation (Fitzpatrick V-VI), light skin pigmentation (Fitzpatrick I-IV), female, and male subjects on the [67 100]% reference SaO 2 range.The percentages of the number of samples are computed with respect to the full dataset, i.e., N = 373.wristband show high clinical accuracy between 70%-100% SpO 2 in the intended use conditions of no-motion and high-perfusion, across individuals with a range of skin pigmentation.This study contributes to the current state of scientific knowledge on the impact of racial bias on SpO 2 measurement and paves the way for further validation during prolonged use in uncontrolled settings and on patients at risk of hypoxemia.

FIGURE 1
FIGURE 1 Schematic diagram of the basic principle of a pulse oximeter showing the components of the blood flow in the tissues in relation to the optical metrics of interest for SpO 2 computation (AC and DC), with the diagram of a typical regression function that maps AC and DC composite metrics into a SpO 2 Reading.

FIGURE 2
FIGURE 2Back view of the EmbracePlus wristband with the reflectance PPG sensor embedded in the device.

FIGURE 5 Bland-
FIGURE 5 Bland-Altman plot for multiple observations with all subjects pooled (N = 373 samples from n = 15 subjects) showing the reference SaO 2 vs. the difference between the EmbracePlus SpO 2 and the reference SaO 2 .Data from individual subjects are color coded.Bland-Altman linear regression fit (red bold line), upper and lower limits of agreement (thin red lines), and mean bias (black line) are shown.In the bottom right corner, the number of subjects (n), data points (N ), and linear fit equation are reported.

TABLE 1
Summary of demographic characteristics of the study participants (n = 16).

TABLE 2
Demographic listing of the study participants.

TABLE 3
Distribution of the conclusive measurements collected by the EmbracePlus wristband and performance of the SpO 2 algorithm in terms of A rms , bias and MAE for different SaO 2 ranges.

TABLE 4
Performance of the SpO 2 algorithm in terms of A rms , bias and MAE for each individual study participant.
N refers to the number of data points.

TABLE 5
List of SpO 2 samples recognized as outliers.
Gerboni et al. 10.3389/fdgth.2023.1258915reduces the risk of occult hypoxemia.Finally, in two cases (i.e., 8%), an outlier was recorded at the upper extreme of the lowest decile (SaO 2 ≈ 80%) involving underestimation of SaO 2 within the lowest decile.The degree of SpO 2 underestimation in these instances would not change the clinical interpretation of critical hypoxia but may bias clinicians toward earlier triage and evaluation.As the device is intended for retrospective data review without alarms or reliance on output for real-time clinical decision-making, the listed outliers do not raise new questions of safety or effectiveness.

TABLE 6
Accuracy metrics of the SpO 2 algorithm in terms of A rms , bias, MAE, upper 95% LoA, and lower 95% LoA as described in the statistical analysis section.