Prediction of Paroxysmal Atrial Fibrillation From Complexity Analysis of the Sinus Rhythm ECG: A Retrospective Case/Control Pilot Study

Paroxysmal atrial fibrillation (PAF) is the most common cardiac arrhythmia, conveying a stroke risk comparable to persistent AF. It poses a significant diagnostic challenge given its intermittency and potential brevity, and absence of symptoms in most patients. This pilot study introduces a novel biomarker for early PAF detection, based upon analysis of sinus rhythm ECG waveform complexity. Sinus rhythm ECG recordings were made from 52 patients with (n = 28) or without (n = 24) a subsequent diagnosis of PAF. Subjects used a handheld ECG monitor to record 28-second periods, twice-daily for at least 3 weeks. Two independent ECG complexity indices were calculated using a Lempel-Ziv algorithm: R-wave interval variability (beat detection, BD) and complexity of the entire ECG waveform (threshold crossing, TC). TC, but not BD, complexity scores were significantly greater in PAF patients, but TC complexity alone did not identify satisfactorily individual PAF cases. However, a composite complexity score (h-score) based on within-patient BD and TC variability scores was devised. The h-score allowed correct identification of PAF patients with 85% sensitivity and 83% specificity. This powerful but simple approach to identify PAF sufferers from analysis of brief periods of sinus-rhythm ECGs using hand-held monitors should enable easy and low-cost screening for PAF with the potential to reduce stroke occurrence.


INTRODUCTION
Atrial fibrillation (AF) is the most frequently encountered sustained cardiac arrhythmia, affecting about 2% of the population. Its prevalence increases with age, rising to 10% of those aged over 80 years. Moreover, AF is associated with an acceleration of cognitive decline and risk of dementia (Singh-Manoux et al., 2017). It is also associated with a fivefold increased risk of ischaemic stroke, as well as increased stroke severity, mortality and disability relative to those arising from other causes (Dulli et al., 2003). Moreover, patients suffering a recurrent stroke are almost twice as likely to have identifiable AF as those presenting with a primary stroke (30 vs 17%; Han et al., 2018), although reporting rates are likely underestimated (Jorfida et al., 2016). In consequence, many patients discharged after a primary stroke are not prescribed anticoagulants, but general prophylactic use of anticoagulants in the absence of an AF diagnosis is not beneficial (Hart et al., 2018). Currently, AF is detected by continuous or periodic electrocardiographic monitoring over extended periods (Kirchhof et al., 2016), using invasive or non-invasive methods (Seet et al., 2011), which can be costly and require patient cooperation.
Paroxysmal AF (PAF) is a self-terminating condition with episodes lasting minutes to days and accounts for 25-60% of diagnosed AF cases (Seet et al., 2011). Studies indicate that stroke incidence is similar in patients with PAF or sustained AF (Banerjee et al., 2013), however, other studies differ (Takabayashi et al., 2015;Ganesan et al., 2016). Nonetheless, PAF is more difficult to detect, and when episodes do occur, up to 90% of those affected have no symptoms (Page et al., 1994), also risking a greater incidence of associated stroke and thromboembolism (Hart et al., 2007). There is therefore an unmet need to improve PAF detection using a non-invasive, low-cost method that could be used by a greater number of people.
Atrial fibrillation is associated with electrical and structural myocardial remodeling and autonomic dysregulation of the heart (Andrade et al., 2014;Nattel and Harada, 2014) which should be reflected in increased electrocardiogram (ECG) signal variability. However, changes to ECG characteristics, such as P wave morphology or heart rate variation, are generally poorly associated with AF incidence and consequent stroke, especially for prediction of PAF (Schaefer et al., 2014;Maheshwari et al., 2019). However, P-wave axis variation is a reasonable predictor (Maheshwari et al., 2019) and supports the concept that small variations of the sinus-rhythm ECG waveform might be useful to predict PAF. A recent study based on machine-learning systems used sinus rhythm ECG traces to extract an AF-signature algorithm with specificity and sensitivity of around 0.8 (Attia et al., 2019), providing further evidence that sinus rhythm ECGs may contain subclinical signs of AF. However, such an approach is computationally complex and does not provide information about specific ECG changes that correlate with AF. The present work develops a method based on analysis of sinus rhythm ECG trace complexity and its day-to-day variability. It offers a simpler tool to screen for PAF and as a novel metric it should also provide additional information that could be combined with other approaches.
Non-linear analytical methods are sensitive tools to estimate the irregularity of biomedical signals and have been used on electroencephalogram recordings to identify onset of epileptic seizures, or risk of Alzheimer's disease (Hornero et al., 2009;Aarabi and He, 2017). The Lempel-Ziv algorithm (Lempel and Ziv, 1976;Kaspar and Schuster, 1987) complexity measure is widely used to estimate the entropy density of symbolic strings by analyzing the generation rate of new patterns. It has been widely used to analyze a variety of biological signals, including neuronal spiking (Amigo et al., 2004), the electroencephalogram (Abásalo et al., 2015) and human motion (Peng et al., 2014), and was also proposed as a feasible tool to assess the signal quality of the ECG (Zhang et al., 2016). The inherently chaotic nature of the ECG signal in both healthy hearts (Goldberger, 1991;Glass, 2009;Shaffer and Ginsberg, 2017) and during atrial fibrillation (Qu, 2011;Aronis et al., 2018) suggested the possibility to use such an estimator for diagnostic purposes. We have used this approach in a pilot study to combine two independent parameters of continuous sinus-rhythm ECG waveforms: dayto-day variabilities of overall signal complexity and also the R-R interval. We demonstrate that PAF prediction is possible with very high specificity and selectivity from recordings made with a simple hand-help device.

Study Design
Participants were recruited from a larger study that took place over 2 years and was a 12-week prospective case-control study, with at least 12-week follow-up. It compared the diagnostic yield of PAF, in a population with symptoms of possible AF, using either a continuous automated cardiac event recorder (the R Test 4 Evolution, Novacor; 1-week test period) or a handheld, battery-driven ECG recorder (Omron HCG-801; Omron Healthcare, United Kingdom). The study was approved by National Research Ethics Service Committee (12/LO1357) and the Royal Surrey County Hospital Research & Development committee. Participants were recruited over 21 months by primary care physicians in the Waverley Health District. Participants gave informed consent and were given a study number to anonymise data. Methods, data collection and storage were performed according to relevant guidelines and regulations in the Research Governance Framework for Health and Social Care (NHS Health Research Authority, 2018) and conformed to updated (March 2018) United Kingdom Policy Frameworks for Health and Social Care Research. All primary data were stored in encrypted and password-protected computers.

Participant Eligibility Criteria and ECG Collection
Inclusion criteria were: presenting with palpitations or an irregular pulse; age ≥ 40 years; no history of AF; no electrolyte abnormalities; no pacemaker device; no prescribed class Ic or III anti-arrhythmic drugs; no other arrhythmias. Controls had no evidence of PAF during the study period. Cases had PAF diagnosed with either device during the main study, recordings for this sub-study were made prior to initiation of any antiarrhythmic drug. PAF was defined as AF lasting 30-s to 7 days with spontaneous termination. Fifty-seven patients (30 cases; 27 controls) were recruited. The cardiologist (PH) reported on ECG data throughout the study and categorized participants as controls or cases. Table 1 lists demographics, clinical data and current medications. Participants recorded 28-s ECG periods (strips) with the Omron recorder twice-daily in a rested state whilst sitting, at roughly 12-h intervals, initially over a period of 5 weeks although some provided more. Initial data evaluation from eight control and seven case participants showed at least 30 strips per participant were required, more provided little additional benefit -see Results. Subsequently, participants were asked to provide recordings over 3 weeks (42 strips) -the signal-tonoise ratio was 15-20 dB. From 57 original participants, 52 (28 cases; 24 controls) provided ≥33 strips for analysis. Four were excluded because four participants provided <30 strips and with one participant base-line drift and extraneous electrical noise during recording was present. The cardiologist also confirmed that traces were representative of sinus rhythm, with no evidence of AF or ventricular abnormalities (dysrhythmias, ectopics, or abnormal waveforms).

Conversion of ECG Recordings to Binary Strings and Analysis
The Omron device is a bipolar, single-channel recorder sampling at 125 Hz with signal bandwidth 0.05-40 Hz. Analyses were enabled by custom-built programs developed in C++ using a Qt framework 1 . The first used documentation provided by Omron, under a non-disclosure agreement, that converted recordings from the proprietary file format to comma-separated-values (csv) text files. Files retained only anonymised information essential for further data processing. The second analyzed csv files by converting floating-point ECG recordings into binary strings, to calculate Lempel-Ziv complexity scores (CS) using two algorithms (Figures 1A,B).
The TC method used a threshold-crossing algorithm replacing all values above a threshold by "1" and setting the rest to "0." The median value of each strip was used as a threshold due to its insensitivity to outliers. The BD (beat detection) method used a QRS complex detection algorithm that assigned a unitary value for each R peak. The first derivative (dV/dt) of the ECG voltage was generated and smoothed, by a process of convolution with a digital Savitsky-Golay filter (Savitzky and Golay, 1964) with a window size of "5, " that increased precision without distorting the signal (Nishida et al., 2017;Sadeghi and Behnia, 2018). A sliding window corresponding to 6 s of strip duration was moved along the signal and the maximum value of dV/dt (dV/dt max ) was found within the window. Then the first sample within the window which satisfied two criteria was taken as the R-peak time and assigned a value of "1, " with all other points a value of "0." The criteria were: i) dV/dt > 0.7 * dV/dt max in the window and ii) dV/dt is greater than both preceding and succeeding values. The window was then advanced and the process repeated. This simple technique was acceptable in recordings lacking artifacts and rhythm irregularities and the algorithm is at heart rates below 100 min −1 (Alexeenko et al., 2019). In this study heart rates for all participants were <100 min −1 [controls; 74 ± 2 (SEM, range 60-91) min −1 , n = 24: cases; 73 ± 1 (SEM, range 59-88) min −1 , n = 28].

Lempel-Ziv Complexity and the Final Outcome Measure, the h-Score
Lempel-Ziv (LZ'76) complexity is a non-linear signal analysis method to estimate sequence complexity (CS; Lempel and Ziv, 1976) by identifying the number of different sub-sequences and their recurrence rate (Radhakrishnan and Gangadhar, 1998). The ECG time series, x(i) was converted to a discrete, binary, sequence, P = s(1),s(2) by comparing x(i) with a threshold T d with s(i): LZ'76 complexity was estimated by scanning P from left to right and increasing a complexity counter c(n) with every new sub-sequence (Lempel and Ziv, 1976;Kaspar and Schuster, 1987). To achieve independence of c(n) from sequence length (n), the number of unique sub-sequences was normalized to the n/log 2 (n) = b(n) value (Hand, 1981; Figure 1B), i.e., CS(n) = c(n)/b(n). Thus, CS TC and CS BD scores were generated for each strip. Next variability (varCS TC or varCS BD ) scores for each patient were calculated as CS variability discriminated better between the two cohorts. Thus varCS TC = (iCS TC -meanCS TC ) 2 , where iCS TC is an individual CS (same for CS BD ).
The final discriminant measure, the h-score was calculated and reflects the independent variability of CS TC and CS BD scores for each participant during sinus rhythm. With a constant, k Frontiers in Physiology | www.frontiersin.org

FIGURE 1 | ECG analysis techniques. (A)
The two methods to convert a digitised ECG recording to a binary string. Threshold Crossing (TC) substitutes "1" for all values equal to or above a median threshold and sets all other values to "0." Beat Detection (BD) sets all values of a binary string to zero except at time-points where the R-wave peak is detected; to R-waves are marked on the trace. (B) The binary strings were split into a set of unique substrings (LZ'76 complexity analysis); the final complexity score was normalized to the length of the recording. (C) Flowchart of ECG processing to obtain a final discriminating h-score.
A flow chart of the analysis is shown in Figure 1C, see Results for calculation of k.
The enclosed Supplement contains the source code for the LZ'76 complexity estimator used in this analysis. The Supplement also includes data sets used for validation as well as the expected program outputs.

Statistical Analysis
Not all summary data sets for CS TC /CS BD , their derived variability scores or the final h-score were normally distributed (Shapiro-Wilks tests) and so these data are quoted as medians with 25 and 75% interquartiles. Differences between controls and cases cohorts were calculated with Mann-Whitney U-tests: the null hypothesis was rejected at p < 0.05. Mean values of CS TC /CS BD for each participant were used to calculate varCS TC or varCS BD scores. Intra-subject analyses showed CS TC scores were normally distributed, except for one in each cohort with excess kurtosis (k) > 1. For CS BD scores, k > 1 with two controls, 12 cases; skewness (s) > | 1| for two cases participants. Categorical data sets were compared with a χ 2 -analysis. Receiver-Operating Characteristic (ROC) empirical curves described test characteristics, with area-under-the-curve (AUC) as a summary statistic 2 . Significance between different AUCs was also tested 3 . The operating point of the final AUC for the h-score was the point where a 45 • line is tangent to the ROC curve. A Spearman rankorder correlation coefficient, r s , was calculated to test association 2 http://www.rad.jhmi.edu/jeng/javarad/roc/JROCFITi.html 3 http://vassarstats.net/roc_comp.html between two variables. Data summaries and statistical analyses were performed using Excel or Vassar Stats.

Participant Characteristics
Participants were divided into cases or controls who did or did not show eventual evidence of paroxysmal atrial fibrillation, but during recording were in sinus rhythm. The two cohorts were statistically similar for all demographic and relevant clinical data ( Table 1). Pharmacotherapy showed that total medications in the cases cohort were greater, in particular for β-blockers (10/28 vs 0/24) and aspirin/clopidogrel (9/28 vs 0/24). Initial sub-analyses revealed no differences in ECG complexity metrics in the cases cohort between those taking β-blockers or aspirin/clopidogrel and those who did not, thus all data in this cohort were combined.

Complexity Score Values
Digitised ECG strips were downloaded and the two Lempel-Ziv CSs (CS TC and CS BD , see section "Materials and Methods") calculated. An initial evaluation of the number of strips required from each participant was carried out with 15 participants (controls n = 8; cases n = 7) who all provided more than 75 strips. Average CS TC was calculated for each participant with the final strip successively removed until only the first 10 were used. The p-value for the difference between the two cohorts showed that discrimination was increasingly lost with fewer than 30 strips per participant (Figure 2A). The inset shows the mean of CS TC values from the two cohorts when the first 35 strips from each patent were used.
The median CS TC score off all strips from 24 control and 28 cases participants was significantly greater in ECG strips from cases vs controls [0.488 (0.434, 0.548) bits/sec, n = 1571 ECG strips vs 0.464 (0.410, 0.515) bits/sec, n = 1392; p < 0.001: Figure 2B]. Median CS BD scores in both cohorts were the same although the two sets were significantly different due a greater range of values in the cases cohort [0.0437 (0.0404, 0.0471) bits/sec, n = 1571 vs 0.0437 (0.0404, 0.0471) bits/sec, n = 1392; p = 0.039]. However, neither score alone provided a useful discriminator due to the considerable overlap of values between the two cohorts, as exemplified by the CS TC data sets in Figure 2B.
Mean values of CS TC and CS BD for each participant were calculated: mean-CS TC values remained significantly (p = 0.039) different between cases and controls (Table 2) but mean-CS BD scores were not significantly different (p = 0.92); Figure 2C. The usefulness of mean CS TC and CS BD scores for identifying future PAF subjects was assessed using a ROC curve analysis ( Figure 2D); neither was a good discriminator between cases and controls with respective area-under-the-curve (AUC) values of 0.662 and 0.523 ( Table 2).

Variability of CS Scores
Variability of individual CS TC and CS BD values (varCS TC or varCS BD ; units, (bits/sample) 2 for a participant were greater in the cases cohort compared to those in the control cohort. Generally, those in the control cohort had fewer outliers and more uniform complexity values than those in the cases cohort. Figure 3A shows examples of CS TC , CS BD and respective varCS TC or varCS BD values from a control participant ( §24), who showed little variability, and a cases participant ( §8) with more variability. Median varCS TC or varCS BD values were both significantly greater for the cases cohort (p = 0.00147, p = 0.00148, respectively, Figure 3B and Table 2). ROC curve analysis of varCS TC and varCS BD performance as binary classifiers showed increased AUC values over the base CS scores (Figure 3C). The varCS TC AUC = 0.740, but was not statistically different from the CS TC value (p = 0.22). However, the varCS BD AUC (= 0.798) was significantly (p = 0.001) improved.

Calculation of the Final Discriminant Score
A key observation was that varCS TC and varCS BD values were uncorrelated for data from a particular participant. Figure 4A plots Spearman correlation coefficients (ρ) and corresponding p-values for varCS TC and varCS BD pairs from individual participants. There was no significant association between these two variance scores for any individual, except for two (in the cases cohort) where significance was just achieved. Overall, the two variance scores could be used as independent variables. The mean values of varCS BD vs varCS TC (Figure 4B, top) showed a clustering of data from the control cohort in the lower left-hand quadrant. Also shown is an ellipse function that optimally separates data points from the two cohorts and with intersections on the two axes at varCS TC = 4.546.10 −3 and varCS BD = 3.77.10 −5 (Figure 4B, top -arrowed). To weight equally the two variance measures, varCS BD values were normalized by multiplying by k = 120.6; the ratio of the two intercepts. Figure 4B (lower) shows the data transformation now with a circle fit of radius 0.00455 bits/sample 2 ; note that only the sub-set of data points near the circle boundary is shown.
Finally, to reduce the dimensionality of the data a coordinate transform (Hand, 1981) was applied to produce a single h-score which quantified the compound variability of CS TC and CS BD for a participant as the length of the vector from the origin to a particular datum point. Values of h-scores are shown in Figure 5A with the decision threshold for the h-score = 4.5·10 −3 shown by the solid horizontal line. Controls and cases were separated with 89% sensitivity (true positive rate) and 83% specificity (true negative rate). ROC curve analysis demonstrated the further superiority of the h-score (p = 0.11 vs varCS TC and p = 0.049 vs varCS BD ) as the discriminant with an AUC = 0.919 (C.I. 0.844-0.994); Figure 5B and Table 2. Figures 4B, 5A show that several of the patient h-scores lie close to the discriminant boundary (h-score = 4.5·10 −3 ) and a small variation of this value could have important consequences of sensitivity and selectivity estimations. The horizontal dotted lines of Figure 4C show values of the h-score varied by ±5 and ±10%.
A decrease of the h-score would decrease the number of false negatives but increase the number of false positives. For a 5 and 10% decrease, sensitivity was either unchanged or increased to 93%, respectively, but specificity was reduced to 71 or 58%, respectively. For a 5 and 10% increase, sensitivity fell to 79 or 71%, respectively, but with increased specificity to 92% in both cases.
The CHA 2 DS 2 -VASc score is used to estimate stroke risk in patients with non-rheumatic AF and may offer a further independent score to predict the occurrence of PAF. Any association between the CHA 2 DS 2 -VASc score and the h-score was tested by calculation of a Spearman rank-order correlation coefficient, ρ: there was no statistical association for the whole data set (ρ = −0.0443, p = 0.758, n = 52). Thus, combination of the h-score with the CHA 2 DS 2 -VASc score would not provide further discrimination between the two cohorts.

DISCUSSION
This study shows that analysis of ECG entropy, using LZ'76 complexity, has potential for diagnosing PAF from sinus rhythm TABLE 2 | Complexity scores (CS) for threshold crossing (TC) and beat detection (BD), their derivative variabilities, varCS TC and varCS BD , as well as final h-scores for data from controls (n = 24) and cases (n = 28) cohorts.

Controls
Cases AUC (C.I.) Data are medians (25,75% interquartiles); *p < 0.05, **p < 0.01, and ***p < 0.001. The final column shows AUC estimations from ROC curve analysis and 95% confidence intervals (C.I.) of differentiation between the two cohorts. ECGs. Analysis of at least 30 half-minute strips per patient, acquired using an inexpensive handheld ECG monitor in patients whilst in sinus rhythm, produced a final score (the h-score) based on individual variability of two ECG complexity measures, CS TC and CS BD . Mean CS TC scores were statistically different in cases compared to control, whereas mean CS BD scores were not, but alone they provided poor discrimination between the two cohorts. Generation of the h-score depended on two key observations of this pair of CSs. First, variability of both CS TC and CS BD (varCS TC and varCS BD ) on a day-to-day basis were greater in the cases cohort compared to controls (Figure 3 and Table 2). Second, varCS TC and varCS BD were independent measures of complexity which enabled generation of a final h-score that provided excellent discrimination between the two cohorts with 83% specificity and 89% sensitivity. The h-score is an absolute value derived from this pilot study of a relatively small cohort of 52 patients, so that a larger study will generate a value with greater confidence. The observation of greater variability of ECG complexity in patients at risk of PAF implies that their atria show subtle electrophysiological changes that, without immediate gross pathophysiological consequence, provide a substrate or trigger for a period of atrial fibrillation. Participant compliance was good, 93% provided sufficient numbers of recordings and only one provided data that could not be analysed. The requirement to produce short recordings in a restful home-setting, using a hand-held device will have contributed to this high rate of participation. Because, the key component of the analysis is measurement of CS variability on a day-to-day basis to generate a discriminant score, fewer but longer individual recordings may not be useful. An alternative refinement may be to determine if multiple-lead ECGs or alternative complexity estimates are better, but this remains to be explored. Other strategies might be derived to process higher quality recordings, for example, using more complex ECG parsing techniques but simpler analysis methods (Alexeenko et al., 2020). The method also required a cardiologist to scrutinize ECG traces before analysis to exclude abnormalities, and although they are relatively uncommon in a general population (Sirichand et al., 2017) a fully automated process would need some preliminary screening process (see Limitations, below). Finally, combination with other approaches, including biomarkers such as brain natriuretic peptide (Rodríguez-Yáñez et al., 2013) or AF-related stroke-risk scores might also provide further discrimination. However, combination of the h-score with CHA 2 DS 2 -VASc scores provided no improvement of selectivity or sensitivity.

CS
The relative simplicity of this predictive method makes it suitable for population screening of at-risk groups or those who values for controls (blue circles) and cases (red circles). The curve is an ellipse that optimally separates data from controls and cases cohorts, arrows mark the intercepts with axes, used to estimate a scaling factor for the varCS BD data -see text for details. Lower plot: transformed data where the varCS BD data are multiplied by a constant, k, to allow a circle function to optimally separate data from controls and cases cohorts -see text for details. cannot co-operate easily with clinical tests. The method may also be applied to analyze previously-collected data, e.g., to investigate links between subclinical AF and cryptogenic stroke (Healey et al., 2012) or development of dementia, when early screening would be especially useful (Cuadrado-Godia et al., 2020).

Methods of Atrial Fibrillation Detection
This method of AF prediction, using simple and unambiguous ECG parsing algorithms coupled to second moment analysis of short and relatively low-quality ECGs recorded by hand-held devices may be compared to other methods that measure existing AF and potentially predict its occurrence. Measurement of existing AF is continuously improving and achieves similar sensitivity and selectivity to that recorded here, even with hand-held devices (Svennberg et al., 2015;Marinucci et al., 2020); however, their use to record paroxysmal AF is limited. Alternatively, analysis of risk factors that combine demographic features, simple clinical tests and plasma biomarkers, through generation of machine-learning models are increasingly sophisticated (Ambale-Venkatesh et al., 2017;Hill et al., 2019). They have the advantage of yielding pathological insight but require sophisticated resources and thus far are less sensitive and selective.
More precise electrophysiological approaches are also being developed. Machine-learning methods for PAF detection use retrospective analysis of freely available clinical ECG recordings or clinical databases. These are often based on detection and classification of atrial premature beats and other ECG abnormalities (Thong et al., 2004), or from interval analysis of atrial or ventricular depolarisations (Ghodrati et al., 2008;Mohebbi and Ghassemian, 2012;Xin and Zhao, 2017;Aronis et al., 2018), with specificity and sensitivity ranging between 71-93% and 85-96%, respectively, but requiring recording periods up to 30 min. However, convolutional neural networks achieve accuracy of detection in the range 75-95% using shorter recording periods of detection (Hsieh et al., 2020;Nurmaini et al., 2020). Finally, machine-learning predictive methods using sinus rhythm recordings are also being developed with sensitivity and selectivity around 83% (Attia et al., 2019).

A Health Economics Perspective
An estimate of net monitoring costs for AF over 1 week after an ischaemic stroke, has been estimated to be about $530,000 at today's costs (Kamel et al., 2010). Outlay for 1,000 re-useable hand-held monitors of about $125,000, plus employment of a biometrics analyst represents a large saving to health-care systems to identify vulnerable patients at risk of subsequent strokes from PAF.

Limitations
(i): Due to the nature of PAF and with intermittent monitoring, some individuals assigned as controls may have undetected PAF. All cases had at least one PAF episode during the study period, but some controls may have experienced PAF at greater intervals. (ii): Recorded co-morbidities were similar in both groups, but more cases than controls took β-blockers and/or aspirin or clopidogrel. (iii): Participant compliance; of the original 57 participants two each of controls and cases supplied < 30 strips for analysis and with one control artifacts precluded analysis. (iv): External electrical noise, e.g., from electromyographic activity of the participant's hand, could add to the ECG signal and if excessive alter the TC complexity score (CS TC ). We added noise to 15 ECG trace segments from 10 random participants (five each of controls and cases) and recalculated CS TC . It showed that when noise exceeded 181 µV SD (equivalent to a signal-to-noise ratio of 15.9 dB) mean complexity scores were altered by more than 2.5%. Therefore, the analysis will be useful for signals with a signal-to-noise ratio > 15.9 dB.
(v): The nature of this pilot study precluded recruitment of additional patients comprising a validation cohort, so this remains a proof-of-principle study. A larger case-control study is required to validate the predictive power of the h-score. (vi): The BD algorithm used in this study was sufficiently robust to detect specifically R-waves in ECG traces showing sinus rhythm. However, ventricular dysrhythmias or waveform abnormalities, such as bigeminy or T-wave alternans, may be confounders that contribute false positives. In this proof-of-principle study, clinical evaluation would have excluded such traces; however, this would be unsuitable for a fully automated process. We envisage the next phase is to incorporate a preliminary ECG parsing step, for example with a Pan-Tompkins parser (Pan and Tompkins, 1985), supplemented with an autocorrelation analysis, to identify traces with these potential confounders.

CONCLUSION
We describe a link between increased variability of ECG complexity in sinus rhythm recordings and PAF incidence and propose a novel score to quantify PAF risk. We envisage this score would enable low-cost screening for PAF based on short periods of ECG recording in a primary care setting or built into hand-held devices. We anticipate such screening would improve detection of PAF relative to currently available techniques (Choe et al., 2015). This may contribute to a reduction of AF-related mortality that, unlike for heart failure, continues to rise, at least in Europe and United States (Vasan et al., 2019).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the study was approved by National Research Ethics Service Committee (12/LO1357) and the Royal Surrey County Hospital Research Development committee. Participants were recruited over 21 months by primary care physicians in the Waverley Health District. Participants gave informed consent and were given a study number to anonymise data. Methods, data collection and storage were performed according to relevant guidelines and regulations in the Research Governance Framework for Health and Social Care (NHS Health Research Authority, 2018) and conformed to updated (March 2018) United Kingdom Policy Frameworks for Health and Social Care Research. All primary data were stored in encrypted and password-protected computers. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
CF, VA, and RJ: devised the study. CF, VA, JF, DA, and RJ: experimental planning. VA, PH, and JF: contributed to the data. RJ and CF: raised funding. CF, VA, and RJ: drafted the manuscript. All authors edited and approved the final manuscript.

FUNDING
This work was supported by the Heart and Stroke Trust Endeavour and the British Heart Foundation (grant number PG/12/64/29828). It was also partially funded by an Isaac Newton Trust / Wellcome Trust ISSF / University of Cambridge Joint Research grant awarded to JAF.