Visualizing and Quantifying Irregular Heart Rate Irregularities to Identify Atrial Fibrillation Events

Background Screening the general public for atrial fibrillation (AF) may enable early detection and timely intervention, which could potentially decrease the incidence of stroke. Existing screening methods require professional monitoring and involve high costs. AF is characterized by an irregular irregularity of the cardiac rhythm, which may be detectable using an index quantifying and visualizing this type of irregularity, motivating wide screening programs and promoting the research of AF patient subgroups and clinical impact of AF burden. Methods We calculated variability, normality and mean of the difference between consecutive RR interval series (denoted as modified entropy scale—MESC) to quantify irregular irregularities. Based on the variability and normality indices calculated for long 1-lead ECG records, we created a plot termed a regularogram (RGG), which provides a visual presentation of irregularly irregular rates and their burden in a given record. To inspect the potency of these indices, they were applied to train and test a machine learning classifier to identify AF episodes in gold-standard, publicly available databases (PhysioNet) that include recordings from both patients with AF and/or other rhythm disturbances, and from healthy volunteers. The classifier was trained and validated on one database and tested on three other databases. Results Irregular irregularities were identified using normality, variability and mean MESC indices. The RGG displayed visually distinct differences between patients with vs. without AF and between patients with different levels of AF burden. Training a simple, explainable machine learning tool integrating these three indices enabled AF detection with 99.9% accuracy, when trained on the same person, and 97.8%, when trained on patients from a different database. Comparison to other RR interval-based AF detection methods that utilize signal processing, classic machine learning and deep learning techniques, showed superiority of our suggested method. Conclusion Visualizing and quantifying irregular irregularities will be of value for both rapid visual inspection of long Holter recordings for the presence and the burden of AF, and for machine learning classification to identify AF episodes. A free online tool for calculating the indices, drawing RGGs and estimating AF burden, is available.

Background: Screening the general public for atrial fibrillation (AF) may enable early detection and timely intervention, which could potentially decrease the incidence of stroke. Existing screening methods require professional monitoring and involve high costs. AF is characterized by an irregular irregularity of the cardiac rhythm, which may be detectable using an index quantifying and visualizing this type of irregularity, motivating wide screening programs and promoting the research of AF patient subgroups and clinical impact of AF burden.
Methods: We calculated variability, normality and mean of the difference between consecutive RR interval series (denoted as modified entropy scale-MESC) to quantify irregular irregularities. Based on the variability and normality indices calculated for long 1-lead ECG records, we created a plot termed a regularogram (RGG), which provides a visual presentation of irregularly irregular rates and their burden in a given record. To inspect the potency of these indices, they were applied to train and test a machine learning classifier to identify AF episodes in gold-standard, publicly available databases (PhysioNet) that include recordings from both patients with AF and/or other rhythm disturbances, and from healthy volunteers. The classifier was trained and validated on one database and tested on three other databases.
Results: Irregular irregularities were identified using normality, variability and mean MESC indices. The RGG displayed visually distinct differences between patients with vs. without AF and between patients with different levels of AF burden. Training a simple, explainable machine learning tool integrating these three indices enabled AF detection with 99.9% accuracy, when trained on the same person, and 97.8%, when trained on patients from a different database. Comparison to other RR interval-based AF detection methods that utilize signal processing, classic machine learning and deep learning techniques, showed superiority of our suggested method.

INTRODUCTION
Atrial fibrillation (AF) is an arrhythmia initiated by ectopic atrial foci which create rapid atrial activity, with variable ventricular response governed by atrioventricular (AV) node conduction. It is the most common type of cardiac arrhythmia and constitutes a major risk factor for stroke and death Bassand et al., 2019). The prevalence of AF is age-dependent, reaching 5% in patients aged 65 years or older (Chugh et al., 2014). Moreover, as the population ages globally, AF is predicted to affect 6-12 million people in the USA by 2050 and 17.9 million in Europe by 2060 (Morillo et al., 2017). Screening for AF in the general public and specifically in risk groups, may enable early detection and the timely administration of anticoagulant treatment, potentially decreasing the incidence of stroke . Currently, diagnosis of AF is based on a standard 12-lead electrocardiogram (ECG). However, in many cases, AF is paroxysmal, with recordings failing to show AF rhythm even in patients experiencing frequent AF events. When AF is not recorded, but clinical suspicion is high (e.g., when searching for the cause of a recent stroke), the patient undergoes ambulatory monitoring and recordings are then analyzed offline. This approach requires manual inspection of the recordings and is therefore difficult to apply for large populations (Hoefman et al., 2010).
AF is well known to be characterized by irregular irregularity of the heart rate (Mann et al., 2015). However, an exact mathematical definition of irregular irregularity is missing, hindering theoretical and computational modeling of AF initiation. Using an intuitive definition, it can be said that an irregular rate is a rate with variable changes in inter-beat intervals and that an irregularly irregular rate is one whose changes are random. We introduce a quantitative embodiment of this intuitive definition to measure short-term changes using a novel index, termed the modified entropy scale (MESC) index, whose distribution can provide indices for both the level of variability and the randomness (referred to as "normality, " see section "Indices for Quantitative Description of Irregular Irregularity" below) of rate. Using such variability and normality indices may enable identification of significant changes between irregularly irregular rates (e.g., AF) and rates that are regular and regularly irregular. We hypothesize that indices aimed directly at detecting irregular irregularity, will aid simple and robust detection of AF Abbreviations: ACC, Accuracy; AI, Artificial intelligence; AF, Atrial fibrillation; AV, Atrioventricular; CoSEn, Coefficient of sample entropy; ECG, Electrocardiogram; MAD, Mean absolute deviation; MESC, Modified entropy scale; NPV, Negative predictive value; PPV, Positive predictive; RGG, Regularogram; RMSSD, Root mean square of successive differences; SampEn, Sample entropy; She, Shannon entropy. from RR interval series. Plotting the variability and normality indices of a long RR interval recording (e.g., extracted from a Holter) generates a "regularogram" (RGG), which provides a visual presentation of AF episodes and their burden. This work aimed to test the ability to detect AF events based on the variability and normality indices, even with a simple machine learning algorithm.

Data Sources and Preprocessing
Publicly available long (10-26 h) ECG recordings of patients with AF events and of healthy individuals, were collected from several PhysioNet (Goldberger et al., 2000) databases. For a given experiment, one dataset was used for training and validation, and the other ones for testing, to avoid overfitting the model to a specific set of records.
The following databases were used: 1. Long Term Atrial Fibrillation Database (LTAFDB) (Petrutiu et al., 2007): a database consisting of 84 long (∼24 h) 2lead ECG recordings sampled at 128 Hz and 12-bit resolution; each record is from a different patient. All patients in this database suffered at least one AF event during the recording, some with persistent AF and some with paroxysmal AF. The recordings contained a variety of rhythms, including normal sinus rhythm and other (non-AF) arrhythmias, including: ventricular tachycardia, atrial and ventricular bigeminy and trigeminy, sinus bradycardia, and others. Sample no. 64 was omitted because rhythm annotations were missing for most of the record (only ∼5 out of 24 h are annotated).
2. Normal Sinus Rhythm Database (NSRDB): a database consisting of 18 long (∼24 h) 2-lead ECG recordings sampled at 128 Hz and 12-bit resolution; each record is from a volunteer with a validated normal sinus rhythm.
3. MIT-BIH Atrial Fibrillation Database (AFDB) (Moody and Mark, 1983): a database consisting of 25 long (∼10 h) ECG recordings sampled at 250 Hz and 12-bit resolution; each record is from a different patient. All patients in this database suffered at least one AF event during the recording, mostly paroxysmal AF. Samples no. 00735 and 03665 were excluded because their signals are unavailable to the public. Samples no. 04936 and 05091 were excluded because they were reported to contain incorrect rhythm annotations (Dash et al., 2009;Lee et al., 2013).
4. MIT-BIH Arrhythmia Database (MITDB) (Moody and Mark, 2001): a database consisting of 48 short (∼30 min) ECG recordings sampled at 360 Hz and 11-bit resolution. This is a diverse dataset with recordings containing a variety of rhythms.

Indices for Quantitative Description of Irregular Irregularity
The proposed characterization of irregular irregularity is based on two questions: whether the rate is regular or irregular and, if the rate is indeed irregular, whether the irregularity is regular or irregular. For each of these questions, regularity is measured by the variability and the kind of regularity is quantified by the normality of the MESC. The MESC is an index which can have different orders. An MESC of order 1 (which is the main order used in this work) is simply the difference between two consecutive inter-beat intervals. In general, the MESC is defined recursively, where an MESC of order n is defined as the difference between consecutive MESCs of order n-1 while an MESC of order 0 is simply the inter-beat interval. The MESC, regardless of its order, is essentially a measure of change: it is low in regular processes and fluctuates furiously in disordered ones. Because the irregular rate tends to be highly disordered with many sharp changes, its MESC tends to be highly variable, so the distribution width of the MESC (referred to herein as "variability") can be used to characterize irregular rates. This measure tends to rise for various types of irregularities in rhythm.
To distinguish between regular and irregular irregularity, we assume that most types of regular irregularities, such as atrioventricular (AV) blocks, premature atrial, and ventricular premature complexes, are statistically a superposition of several regular rhythms; therefore, their unified distribution is far from normal. In contrast, the irregular irregularity of the ventricular activity during AF can be modeled as a non-linear stochastic process (Aronis et al., 2018) influenced by both chaotic atrial activity and disordered AV node conduction. Each of these processes is a summation of multiple stochastic processes and is therefore intuitively expected to have an approximately normal distribution, yielding a normally distributed MESC, as demonstrated empirically in our experiments. Consequently, our second requirement of irregularly irregular rhythms is randomness of the MESC (referred to herein as "normality").
Taken together, an irregular irregularity can be characterized as a rate with wide and normal distribution of the MESC. Namely, the heart rate within a time window of some dozens of consecutive beat intervals (referred to herein as an "estimation window") can be described as irregularly irregular if both its variability and normality of its rate are high.
Please see mathematical definition in Supplementary Material.

Data Preprocessing
Recordings were preprocessed using MATLAB R R2019A (The MathWorks Inc., Natick, Massachusetts). For all databases, the original beat annotations were used to extract beat times throughout the recording; the technique used for beating annotations in each database is elaborated in their official documentation. Consecutive beat times were subtracted to yield inter-beat intervals.
To label AFs in the records, the rhythm annotations of the databases were used; for all databases, the rhythm annotations were performed by manual inspection by expert cardiologists. The inter-beat interval time series was divided into overlapping windows (window length was optimized experimentally, as described below). Windows with ambiguous labeling (containing different rhythms at different parts of the window) were discarded.
The MESC time series was calculated for each time window. The variability and normality indices, as well as the mean of the MESC (to address rapid AF episodes) were then subsequently calculated. The indices were also calculated using MATLAB R R2019A. To calculate the normality index, we implemented a fast novel estimator for the Kolmogorov-Smirnov statistic based on a work by Vrbik (2018).
For unannotated datasets, manual or automated beat time detection would be needed. The choice of method should be based on the signal at hand. After the point beat times are detected, the processing described above can be applied.

Regularogram (RGG)
The RGG is a 2-D plot drawn from the variability and normality indices plotted against one another. Each point in the plot represents a single estimation window of the indices. Windows with an irregularly irregular rate tend to be found inside a characteristic zone of the plot (referred to herein as "irregular irregularity zone"). RGGs containing multiple estimation windows from a longer record, provide a visual presentation of irregularly irregular rates (presence of points in the zone) and their burden (clustering of points in the zone).
Due to the utility of visualization of an entire Holter recording in a single plot, we provide a free online tool for calculation of the indices, drawing of the RGG and estimation of AF burden 1 .

Machine Learning Classifier
Our exploratory data analysis (see below) showed that the irregularity indices described above can yield a visibly good separation of the AF and non-AF time windows on the RRG; however, the border between them is not a straight line (non-linear separation). Therefore, to demonstrate the potential of detecting AF based on the variability and normality, we applied them to train and test a machine learning classifier for AF detection (Figure 1). The goal was to demarcate the irregular irregularity zone and not necessarily to maximize performance; thus, a simple and fast, but nonlinear, classification model, i.e., a decision tree, was sufficient and allowed explainable classification. The decision tree was implemented using MATLAB R R2019A (The MathWorks Inc., Natick, Massachusetts), with the default settings. The only choice made was to limit the number of branches to 30 (an empirical choice) to avoid overfitting. Windows containing more than one rhythm were removed due to labeling ambivalence.
The classification work had four stages: 1. Exploratory data analysis: Manual exploration of the records, visualizations, and basic statistics. The main useful visualizations were RGGs, plots of the indices and onset of AF in time, and an extended version of the RGG, including FIGURE 1 | Data pipeline for the AF detection system. RR intervals are extracted from an ECG recording, then the MESC is calculated and used to estimate the variability, normality, and mean indices. The three indices are used by a decision tree to distinguish between AF and other arrhythmias. AF, atrial fibrillation; ECG, electrocardiogram; MESC, modified entropy scale.
variability, normality, and mean MESC. We performed a preliminary analysis by training a model using records from a single patient each time, and then testing on data from the same patient to demonstrate the existence of the irregular irregularity zone, without the complexity of inter-personal variability. 2. Validation: To find the optimal combination of hyperparameters (correct order of MESCs and estimation window length) we did cross validation; a full description is provided in Supplementary Material. 3. Final training: The model was trained on the full datasets, one at a time, using the hyperparameters shown in step 2 to yield the best accuracy. 4. Testing: The model was tested on the other three datasets.

Performance Statistics
The detection results are presented using the standard metrics of clinical trials: sensitivity, specificity, positive predictive value (PPV, precision), negative predictive value (NPV), accuracy (ACC) and F1 score, derived as follows:

General Statistics
To determine the statistical significance of the differences in accuracy between different sets of parameters in the validation stage, a one-tailed, unpaired t-test was performed comparing the best mean validation result with each of the other mean results. A value of p < 0.05 was considered significant.

Irregular Irregularity Index-Exploratory Data Analysis
To obtain a basic idea of the ability of the variability and normality indices to discern between AF and non-AF rhythms, data were first manually inspected. Figure 2 shows a RGG generated from a recording collected from the LTAFDB database. Distinct regions for the AF estimation windows (the irregular irregularity zone) and the non-AF estimation windows are apparent. Note that both indices are required for such a classification. Figure 3A presents the typical pattern of AF onset and the corresponding changes in the variability and normality. Figure 3B presents a typical non-AF interval. Although the variability and normality indices fluctuate, they do not rise together. The rhythm before the onset of the fibrillation is irregular (normal sinus rhythm with many missed beats and premature atrial contractions), which translates to a high variability before AF onset, while the normality only rises after most of the estimation window is inside the AF episode.
As AF is frequently a tachycardic rhythm, examination of the regularity and normality indices vs. the heart rate is reasonable. To address the nature of AF as a tachyarrhythmia, we show here an extended version of the RGG; Figure 4 visualizes the normality and variability of the MESC of order 0 plotted against the mean RR on a 3D scatter plot. In these representative examples, the distinct separation between AF and non-AF events is clear. Figure 4 also shows the trajectory between AF and non-AF events which was omitted (ambivalent windows because it includes both AF and non-AF rhythms) in our analysis.
The next step was to verify that the distinctly visible regions consistently exist across AF patients. Even if such distinct regions do exist for every patient, they may differ between patients. To isolate the problem of inter-patient variability from the question of AF region existence, we performed a simple training and validation process using data from the same patient, and decision trees of different complexities. Note that each split of the tree is a single separating line parallel to one of the axes in the feature space. For example, a tree with 1 split is simply a threshold considering a single index; a tree with 4 splits may describe a rectangular area on the RGG for one class and the rest of the plane for the other. Table 1 shows the average accuracy results for the patientto-self experiment. Even simple trees with 4 splits yielded high  accuracy. Due to the way decision trees are constructed, this implies that, for most patients, there exists a window in the RGG plane containing almost all AF episodes. However, this experiment did not inform whether its boundaries are similar for different patients.
The Regularogram (RGG) Figure 5 presents four RGGs calculated using records from healthy individuals and four RGGs prepared using records from AF patients. All the AF patients had paroxysmal AF, with AF rhythm for less than 20% of the record and other rhythms for the rest of the record (Figure 5). Although each of the eight RGGs in the figure represents ∼24 h of Holter recording, the plots provide a simple visualization of AF episodes occurring during the recording.

AF Burden
To assess the possibility to measure AF burden using an RGG plot and eyeballing only, we implemented a graphic user interface, which allows the user to inspect an RGG and mark a rectangular area suspected to be the AF region. Then, the program calculated the estimated AF burden in the marked area and compared it  to the annotations of the database. An experienced inspector from our research group and a blinded evaluator, separately marked the RGGs of patients from the LTAFDB (n = 83). Full details about the conduct of the experiment are provided in Supplementary Material. The mean absolute error between the true AF burden and the burden estimated by RGG eyeballing for the blinded assessor was 4.33% and for the experienced inspector was 2.46%. The inspection took less than 5 s; the ground truth annotation was made by manual inspection of 24 h ECGs.

AF Detection
After validating the best-performing set of parameters, the set was applied to train the model on each of the databases separately. We then tested it on the other databases and reported performance on the other sets and on the train set itself. Table 2 summarizes the results of the analyses. The results of the training set appear in gray, which, because of the risk for overfitting, are merely a useful indicator of successful training. The other databases were comprised of records from patients that were not included in the training set, and thus can be used to reliably test performance.
When the LTAFDB was used for training ( Table 2), better results were achieved with AFDB as compared to MITDB records, in all measured parameters. Because NSRDB does not contain AF events, it could only be used to inspect the false positive rate. Similar results were obtained when training on the AFDB as when training on the LTAFDB. For both training sets, the performance on the MITDB was good in terms of sensitivity, specificity, NPV, and accuracy, albeit with low PPV. The model trained on the MITDB was highly specific, but not sensitive on the other sets.

DISCUSSION
AF is characterized by irregular irregularity in cardiac rhythm. However, no simple mathematical definition exists for such rhythm in the literature. To quantify such rhythms, only heart rate measurements, rather than entire ECG recordings, are needed. Thus, in the age of smartphones, wearables, and the internet of things, simple indices that quantify irregularly irregular rhythms and detect AF events can be embedded on a mobile device and paired with a device that continuously measures the heart rate. In addition to the introduction of the MESC, we introduced the RGG, a convenient presentation enabling quick manual identification of AF episodes over a long recording. We also showed that a simple artificial intelligence (AI) system can be used to detect AF events.

Modified Entropy Scale Index
We showed here that by using normality, variability, and mean MESC indices, AF events can be automatically identified with high accuracy. We achieved high accuracy by exploring the distribution of parameters, rather than a single average value, in a short beat interval series (150 beats, ∼2.5 min); shortening the estimation window to as few as 70 beats did not significantly reduce performance. The highest accuracy of AF detection was achieved for a first-order MESC index and no further improvement was achieved when higher-order indices were used. Taken together, irregular irregularity can be quantified by assessing changes in heart rate fluctuation over a short time period. The existing simple, short time scale indices are linear indices, which have been shown to perform poorly in detection  The gray line indicates the dataset used for training. Se, sensitivity; Sp, specificity; PPV, positive predictive value; NPV, negative predictive value; ACC, accuracy; F1, F1 score.
of AF (Kennedy et al., 2016). Other indices for the quantification of short time scale fluctuations have been recently suggested, but their performance as an indicator of irregular irregularity has not been tested (Costa et al., 2017).

The RGG
The introduced RGG is a convenient method for rapid inspection of long Holter recordings in one shot. The RGG also enables evaluation of the AF burden within seconds. Current protocols often manage patients with nearly persistent AF and patients with only occasional events in a similar fashion. A simple tool assessing the AF burden may allow for personalized treatment of patients. Beyond recognizing whether the patient had AF events and assessment of their burden, it can distinguish between different types of AF. For example, paroxysmal and persistent AF events have similar normality, but usually have different variability. The differences between groups of patients with distinctly different RGGs suggest heterogeneity in the AF patient population and requires further investigation.

AF Identification
We were able to detect AF episodes with high accuracy, even without training on the same patient data and even when testing with data that included other arrhythmias. Other methods to detect AF were suggested in the past, however, their application and performance tests have certain limitations. In some, data were used from the same database for training and testing; thus, the detection may have been biased to certain populations or certain recording devices (Lee et al., 2011;Lian et al., 2011;Kennedy et al., 2016). In others, full ECG recordings were used and not only the heart rate series (which excludes photoplethysmogram-based implementation) (Li et al., 2018;Xia et al., 2018). Some used numerous indices that can lead to overfitting and over-complexity (Gilani et al., 2016). Several techniques only use the beat interval series as an input (Costa et al., 2005), but require lengthy recordings. Those that did utilize simple short-term HRV indices, showed low performance (Kennedy et al., 2016). Even though we tested our algorithm in a stricter manner than most similar works, our measurements showed that the algorithm was competitive and even exhibited accuracy that was superior to that of other AF detection algorithms. A comparison to several state-of the-art methods is shown in Table 3. Due to the lack of a gold standard benchmark, each group reported performance in a different way. Zhou et al. (2015) performed a comprehensive comparison of methods in 2015; thus, we adopted their benchmark (using the LTAFDB as a training set and AFDB, MITDB and NSTDB as test sets), which was based on detection performance as expressed by sensitivity, specificity, positive and negative predictive values, and accuracy. To enable a simple comparison between methods using a single score, we calculated the F1 score.
None of the groups that developed the methods was willing to share the original implementation of their method. Therefore, the following approach was used: -Results are quoted from papers using a similar benchmark, training on one of the mentioned datasets and testing on most of the others. -For papers using a different benchmark, but with adequate elaboration of the method proposed, we ¥As reported in the original paper. §Re-implemented. ¤Inspired by. The gray line indicates the dataset used for training. SampEn, sample entropy; CoSEn, coefficient of sample entropy; RMSSD, root mean square of successive differences; nRMSSD, normalized RMSSD; ShE, Shannon entropy; SVM, support vector machine; CNN, convolutional neural network; LSTM, long-short term memory; MAD, mean absolute deviation. meticulously re-implemented the method using every detail of the implementation that was available in the paper or supplements.
-For papers with inadequate information for reimplementation (for example, containing proprietary steps), but with a method that seemed promising, we used the features proposed in the paper, with the same classifier we used for our own method.
The comparison showed that our irregularity features, even when used with a simple classifier, e.g., a 30-branch decision tree, yielded better results (as embodied by the F1 score in Table 3) than symbolic dynamics and Shannon entropy (Zhou et al., 2015); sample entropy (SampEn), coefficient of sample entropy (CoSEn), root mean square of successive differences (RMSSD), normalized RMSSD (nRMSSD) and Shannon entropy (ShE) classified by a gaussian kernel SVM (Andersen et al., 2017); deep learning model comprised of CNN blocks feeding an LSTM processing the RR interval series directly (Andersen et al., 2019); CoSEn, RMSSD, mean absolute deviation (MAD) and coefficient of variance classified by a random forest. As indices based on the Poincare plot were used in the Apple Heart Study (Perez et al., 2019) and in the preliminary study of the WATCH AF study (Krivoshei et al., 2017), we included a comparison to a system based on these features classified by the decision tree algorithm used for our method. This comparison also shown better results for our irregularity indices.

Other Irregularly Irregular Rhythms
AF is not the only arrhythmia described as an irregularly irregular rhythm (Margulescu et al., 2016). Atrial and ventricular ectopic beats and atrial flutter with variable atrioventricular conduction may also present with an irregularly irregular rhythm. While atrial and ventricular ectopic beats appear for only a couple of beats, with a minor effect on our detection system (that uses 150 beats), atrial flutter with variable heart block would affect our ability to detect AF. To verify that the irregular irregularity is an atrial flutter, the ECG should show a "saw-tooth" pattern and irregularly irregular normal QRS complexes. Because our approach is based on beat intervals, it cannot distinguish between AF and atrial flutter with variable heart block. Note, however, that atrial flutter, in general, is described as regular and is usually distinguishable from AF by their distinct variability and normality indices; only in the presence of variable AV conduction does this problem arise.

Application
Mass screening for AF in an aged population identified a significant number of unrecognized and untreated AF participants (Svennberg et al., 2015). An automated tool for detection of AF events and for verification of AF in a recording, opens a new avenue for massive population screening. The ability to detect AF using a beat interval series only can lead to new applications based on smartwatches and fitness bands (Bumgarner et al., 2018), which are more convenient and affordable than mobile ECGs. Potentially, our algorithm can be used on cardiac implantable electronic devices recordings. However, as implantable electronic cardiac devices record much more detailed signals (e.g., direct intra-atrial electrical activity measurements) than beat intervals, we assumed that detection algorithms better than that proposed in the current work, can be developed. Our method is advantageous for non-invasive measurements, when less information is available.

LIMITATIONS
This work focused on one family of indices that were all derived from the MESC index. Addition of other indices that quantify heart rate fluctuation changes over short time scales may improve the performance of the classifier. In addition, a simple machine learning approach was used. It is possible that more sophisticated AI algorithms, such as deep learning, would improve the results. However, results based on deep learning algorithms are usually a "black box"; even if the results are good, they do not provide physiological insights.
We omitted the estimation windows with ambivalent labeling. For a retrospective analysis, this is a common and reasonable practice. However, this subject should be addressed when pursuing real-time applications.

CONCLUSION
The proposed variability and normality of MESC indices comprise valuable parameters for characterization of the regularity of heart rate. The indices are useful for both rapid visual inspection of long Holter recordings when plotted as a RGG, and for machine learning classification of AF events.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://www.physionet.org/.

AUTHOR CONTRIBUTIONS
YY and NK conceived and designed the research. NK and YE did the experiments. YY drafted the manuscript. NK, YE, and AS edited and revised the manuscript. NK, YE, AS, and YY approved the final version. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the Israel Ministry of Science (AS and YY) and Yad Hanadiv Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.