Smartwatch Electrocardiograms for Automated and Manual Diagnosis of Atrial Fibrillation: A Comparative Analysis of Three Models

Aims The diagnostic accuracy of proprietary smartwatch algorithms and the interpretability of smartwatch ECG tracings may differ between available models. We compared the diagnostic potential for detecting atrial fibrillation (AF) of three commercially available smartwatches. Methods We performed a prospective, non-randomized, and adjudicator-blinded clinical study of 100 patients in AF and 100 patients in sinus rhythm, patients with atrial flutter were excluded. All patients underwent 4 ECG recordings: a conventional 12-lead ECG, Apple Watch Series 5®, Samsung Galaxy Watch Active 3®, and Withings Move ECG® in random order. All smartwatch ECGs were analyzed using their respective automated proprietary software and by clinical experts who also graded the quality of the tracings. Results The accuracy of automated AF diagnoses by Apple and Samsung outperformed that of Withings, which was attributable to a higher proportion of inconclusive ECGs with the latter (sensitivity/specificity: 87%/86% and 88%/81% vs. 78%/80%, respectively, p < 0.05). Expert interpretation was more accurate for Withings and Apple than for Samsung (sensitivity/specificity: 96%/86% and 94%/84% vs. 86%/76%, p < 0.05), driven by the high proportion of uninterpretable tracings with the latter (2 and 4% vs. 15%, p < 0.05). Conclusion Diagnosing AF is possible using various smartwatch models. However, the diagnostic accuracy of their automated interpretations varies between models as does the quality of ECG tracings recorded for manual interpretation.


INTRODUCTION
Atrial fibrillation (AF) is the most common sustained arrhythmia in clinical practice but often remains undiagnosed. The ability to record an ECG tracing that is equivalent to lead I at any time and as often as desired is a relatively new feature of select smartwatches, creating opportunities to diagnose cardiac abnormalities such as AF (1)(2)(3)(4). Recent guidelines recognize the potential value of smartwatch-based ECGs for diagnosing AF (5). Apple, Inc (Cupertino, CA, USA) released the first smartwatch to receive FDA approval for automated detection of AF, but smartwatches from competitors such as Samsung (Seoul, South Korea) and Withings (Issy les Moulineaux, France) can similarly record ECG tracings and warn wearers when AF is detected (6). The process of recording an ECG, analyzing it to generate an automated diagnosis of AF, and providing options to transmit these results to the wearer's physician(s) are similar between smartwatch manufacturers. However, their diagnostic algorithms are proprietary and not made available for analysis. The diagnostic accuracy of these algorithms and the ability of healthcare professionals to correctly interpret smartwatch-based ECGs may differ between commercially available smartwatches. Given this technology's widespread and growing use, mass screening for AF using various smartwatch-based technologies may effectively soon occur, the results of which will require clinical decisions on the part of healthcare professionals. Critical evaluation of the relative diagnostic strengths and weaknesses of commercially-available smartwatch technologies is therefore critical. The primary objective of our study was to compare the diagnostic performance of smartwatch ECGs from three companies (Apple, Samsung, and Withings), specifically their ability to accurately differentiate sinus rhythm (SR) from AF using either their automated algorithms or through review of recorded smartwatch ECG tracings.

METHODS
This was a prospective, non-randomized, and blinded clinical study of 100 consecutive patients in sinus rhythm who had undergone an AF ablation procedure in the previous 6 months and 100 consecutive patients in persistent or permanent AF who were referred for catheter ablation. All patients were ≥18 years of age and provided informed consent. Patients with atrial flutter, permanent pacemakers or implantable cardioverter-defibrillators were excluded. All patients had 12-lead ECGs performed, which served as the reference standard for the diagnosis of AF or sinus rhythm. Immediately after the 12-lead ECG was performed, 30-s ECG tracings using an Apple Watch Series 5 R (Apple Inc, Cupertino, CA, USA), Samsung Galaxy Watch Active 3 R (Samsung, Seoul, South Korea), and Withings Move ECG R (Withings, Issy-les-Moulineaux, France) were recorded in random order and after providing standardized instructions. These smartwatches' automated AF-detection algorithms yield one of several possible results, including "sinus rhythm, " "atrial fibrillation, " "low heart rate, " "high heart rate, " "poor recording" or "inconclusive recording." All smartwatch ECG recordings were saved as PDF documents for offline analysis, anonymized, randomized and each automatic diagnosis was removed before distribution to two blinded electrophysiologists who independently interpreted each tracing and assigned one of three possible diagnoses: AF, SR, or unclassified (unable to differentiate between AF and SR). In addition, the quality of smartwatch ECG tracings was classified as good, poor but interpretable (e.g., presence of artifacts but differentiating between AF and SR was deemed possible), and uninterpretable. In case of disagreements between the two experts, a third cardiac electrophysiologist reviewed the tracing and made the final diagnosis.

Statistical Analysis
For each of the three smartwatch models, sensitivity, specificity, positive predictive values and negative predictive values were calculated for automated and physician-interpreted smartwatch ECGs. Classifications were not binary as ECGs could be non-classified (i.e., inconclusive automated diagnoses or uninterpretable ECG tracings as per reading physicians) therefore two analyses were undertaken. In the first analysis, unclassified ECGs were considered false positives (when the patient was in SR) or false negatives (when the patient was in AF), yielding "worst-case-scenario" estimates (7). In the second approach, unclassified ECGs were excluded from the analysis. Kappa (κ) coefficients for interobserver agreement were assessed for the three models. Analysis of variance (ANOVA) tests were used to compare percentages between the three groups. All analyses were performed using SPSS software ver. 22.0 (IBM, Armonk, NY, USA) with a two-tailed alpha level of 0.05 to define statistical significance.

RESULTS
In total, 200 patients were enrolled (100 in SR, 100 in AF). Their mean age was 62 ± 7 years and 56% were male. Standard 12-lead and smartwatch ECGs from all the three models could be recorded in all patients, generating 200 12-lead ECGs and 600 single-lead smartwatch ECGs available for analysis. Representative examples of smartwatch ECGs from each model in a patient in AF is shown in Figure 1.

Comparison Across Smartwatch Models
We presented the results separately for SR and AF since inconclusive diagnoses may differ between rhythms. All automated smartwatch algorithms had high sensitivity and specificity for the diagnosis of AF even when considering unclassified tracings as false results (Figure 2). However, the Withings smartwatch had lower sensitivity and specificity relative to Apple (p = 0.02 for comparison of sensitivity and specificity between Withings and Apple) and Samsung models (p = 0.03 compared with Withings) when unclassified ECGs were considered false results, possibly due to the higher proportion of unclassified ECGs with this smartwatch (19 vs. 10% and 10% respectively, p < 0.05).

DISCUSSION
Direct access to wearable devices equipped with portable ECG technology is now widespread. This feature may prove useful for detecting symptomatic and asymptomatic AF, thus creating opportunities to intervene. Previous studies have always investigated a single model, mostly focusing on optical sensors and connected ECG wristbands (8)(9)(10)(11). However, the relative diagnostic value of available smartwatch models is poorly known. Our results show that the accuracy of automated algorithms for the diagnosis of AF vary between smartwatch models as does the quality of ECG tracings recorded for offline interpretation by healthcare professionals.

Automated Diagnoses: Sinus Rhythm vs. Atrial Fibrillation
Algorithm-based automated AF diagnoses may have undesired consequences. A less-than-perfect screening test used in a population with low pre-test probability of cardiac arrhythmias Frontiers in Cardiovascular Medicine | www.frontiersin.org FIGURE 5 | Smartwatch ECGs in the same patient with confirmed AF. Although the Samsung ECG was classified as difficult to interpret due to artifact, its automated algorithm correctly diagnosed AF. In contrast, the Withings ECG is of high quality but its automated algorithm failed to diagnose AF. translates into a modest post-test probability of disease. False positives can be associated with anxiety, unnecessary medical testing, and even potentially inappropriate treatments. On the other hand, false negatives (diagnoses of SR or inconclusive rhythm when the patient is in AF) can falsely reassure the patient and lead to diagnostic and therapeutic delay. The results of our study show that the sensitivities and specificities of all three algorithms are high. While the Withings algorithm is associated with a slightly but significantly lower sensitivity, this may be due to the higher proportion of ECGs reported as inconclusive with this smartwatch. Inconclusive rhythm classifications may occur in several circumstances: if the heart rate is too high (depending on the model), the heart rate is too slow, the patient is in an arrhythmia other than AF, the tracing is of low quality and uninterpretable by the algorithm, or criteria are not met to classify the rhythm as SR or AF. The proportion of inconclusive tracings is expected to diminish as improvements in filtering, changes in algorithms, and widening of interpretable heart rate windows are implemented. For instance, the heart rate threshold above which AF is not diagnosed has been recently increased from 120 to 150 bpm in Apple smartwatches. The impact of inconclusive recordings may also be reduced with more patient practice, repeated recordings over time and alternative smartwatch positions (12)(13)(14). Artificial intelligence approaches may also improve the accuracy of automated diagnoses of smartwatches (15,16). Alternative over-the-counter technologies to self-diagnose AF have also shown excellent accuracy among which ECG devices (such as AliveCor R 6L) and photoplethysmography-based smartphone apps (such as FibriCheck R ) (17,18). Smartwatches are expected to be more often used then mentioned alternative technologies as they are mostly acquired for non-medical purposes, not motivated by a healthcare professional.

Quality of the Tracings and Interpretation by Electrophysiologists
The product user manuals of the different smartwatches caution that the automated diagnosis (SR vs. AF) is provided only for information purposes and is not intended to replace the analysis of the tracing by a qualified health professional. Even though the accuracy of automated AF diagnoses is high, it remains imperative that a healthcare professional confirm the diagnosis before any therapeutic decision is made. The role for direct-to-consumer ECG tools in future guidelines will be defined by their feasibility and accuracy as shown in validation studies. Our study highlights that ECG tracing quality can differ between models with a direct impact on their diagnostic value. In our study, the quality of the tracings was lower using Samsung devices, which rendered ECG interpretation more difficult (the example shown in Figure 5 was classified as difficult to interpret). In fact, for this model, the automated diagnosis of AF outperformed offline ECG interpretation by experts. This may be due to differences in the criteria used to diagnose AF between smartwatch algorithms and physicians. For existing devices, automated AF diagnoses are schematically based on the exclusion of heart rates that are too fast or too slow (with different thresholds used across models), on the irregularity of QRS complexes, and the absence of repetitive patterns associated with extrasystoles. A perfectly stable rhythm will therefore usually be classified as sinus rhythm and an irregular rhythm as AF without a dedicated analysis of atrial activity. In contrast, although the above features are considered by electrophysiologists, direct analysis of atrial activity is considered an essential component of the diagnosis of AF-a criterion that generally requires an ECG tracing without excessive artifact or baseline wander for at least a few seconds. Without this confirmation, physicians may be reluctant to diagnose AF even if suspected.

Study Limitations
This was a single-center study of 200 patients, half of whom had AF and half of whom had undergone atrial ablation. The accuracy of these devices in a larger population with or without cardiovascular risk factors or previous cardiac interventions remains to be shown. Participants were instructed on how to use the smartwatch prior to obtaining each recording and their ability to record each tracing was directly observed. The performance of the algorithms and the quality of the recorded tracings may be less accurate in an ambulatory setting without this instruction. However, none of the patients who participated in our study had previously used these smartwatches. While examiners were blinded to the concomitant automatic diagnosis and to the manual diagnosis of the smartwatch ECGs of the other models in the same patient, they were not blinded to the smartwatch model as each model features distinct characteristics on the ECG which make the manufacturer identifiable. More in-depth information about filters and algorithms would facilitate the comprehension of differences in performance between the smartwatch models but unfortunately this information is not made publically available by the manufacturers.

CONCLUSION
Diagnosing AF is possible using various ECG smartwatch models. Our study demonstrates that there exist differences in the diagnostic accuracy of their automated algorithms and in the quality of ECG tracings recorded, the latter of which influences the ability of healthcare professionals to make a manual diagnosis of AF.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Bordeaux University Hospital. The patients/participants provided their written informed consent to participate in this study.