Forecasting Seizure Likelihood With Wearable Technology

The unpredictability of epileptic seizures exposes people with epilepsy to potential physical harm, restricts day-to-day activities, and impacts mental well-being. Accurate seizure forecasters would reduce the uncertainty associated with seizures but need to be feasible and accessible in the long-term. Wearable devices are perfect candidates to develop non-invasive, accessible forecasts but are yet to be investigated in long-term studies. We hypothesized that machine learning models could utilize heart rate as a biomarker for well-established cycles of seizures and epileptic activity, in addition to other wearable signals, to forecast high and low risk seizure periods. This feasibility study tracked participants' (n = 11) heart rates, sleep, and step counts using wearable smartwatches and seizure occurrence using smartphone seizure diaries for at least 6 months (mean = 14.6 months, SD = 3.8 months). Eligible participants had a diagnosis of refractory epilepsy and reported at least 20 seizures (mean = 135, SD = 123) during the recording period. An ensembled machine learning and neural network model estimated seizure risk either daily or hourly, with retraining occurring on a weekly basis as additional data was collected. Performance was evaluated retrospectively against a rate-matched random forecast using the area under the receiver operating curve. A pseudo-prospective evaluation was also conducted on a held-out dataset. Of the 11 participants, seizures were predicted above chance in all (100%) participants using an hourly forecast and in ten (91%) participants using a daily forecast. The average time spent in high risk (prediction time) before a seizure occurred was 37 min in the hourly forecast and 3 days in the daily forecast. Cyclic features added the most predictive value to the forecasts, particularly circadian and multiday heart rate cycles. Wearable devices can be used to produce patient-specific seizure forecasts, particularly when biomarkers of seizure and epileptic activity cycles are utilized.

The unpredictability of epileptic seizures exposes people with epilepsy to potential physical harm, restricts day-to-day activities, and impacts mental well-being. Accurate seizure forecasters would reduce the uncertainty associated with seizures but need to be feasible and accessible in the long-term. Wearable devices are perfect candidates to develop non-invasive, accessible forecasts but are yet to be investigated in long-term studies. We hypothesized that machine learning models could utilize heart rate as a biomarker for well-established cycles of seizures and epileptic activity, in addition to other wearable signals, to forecast high and low risk seizure periods. This feasibility study tracked participants' (n = 11) heart rates, sleep, and step counts using wearable smartwatches and seizure occurrence using smartphone seizure diaries for at least 6 months (mean = 14.6 months, SD = 3.8 months). Eligible participants had a diagnosis of refractory epilepsy and reported at least 20 seizures (mean = 135, SD = 123) during the recording period. An ensembled machine learning and neural network model estimated seizure risk either daily or hourly, with retraining occurring on a weekly basis as additional data was collected. Performance was evaluated retrospectively against a rate-matched random forecast using the area under the receiver operating curve. A pseudo-prospective evaluation was also conducted on a held-out dataset. Of the 11 participants, seizures were predicted above chance in all (100%) participants using an hourly forecast and in ten (91%) participants using a daily forecast. The average time spent in high risk (prediction time) before a seizure occurred was 37 min in the hourly forecast and 3 days in the daily forecast. Cyclic features added the most predictive value to the forecasts, particularly circadian and multiday heart rate cycles. Wearable devices can be used to produce patient-specific seizure forecasts, particularly when biomarkers of seizure and epileptic activity cycles are utilized.
Keywords: seizure forecasting, cycles (cyclical), seizure cycles, circadian rhythms, multiday rhythms, wearable sensors INTRODUCTION Epilepsy is one of the most common neurological disorders, affecting roughly 1% of the world's population (1) and responsible for 20.6 million disability-adjusted life-years (DALYs) lost, which is comparable to breast cancer in women and lung cancer in men (2). Epilepsy is characterized by an increased predisposition of the brain to generate epileptic seizures, which often result in vast neurobiological, cognitive, psychologic, and social consequences (3). Despite decades of new drug development and surgical treatment, up to one-third of people with epilepsy continue to suffer from recurrent seizures (4,5). While most people are symptom-free for more than 99.9% of their day-to-day life, epileptic seizures are sudden, potentially catastrophic events that can be life-threatening both for the person with epilepsy and others. Crucially, sudden death in epilepsy (SUDEP), most often following a convulsive seizure, is 27 times more likely than sudden death in control populations, a mortality burden second only to stroke when compared to other neurologic diseases (6,7). Aside from these risks, living with epilepsy can take a major toll on quality of life and independence, as the unpredictable nature of seizures causes feelings of uncertainty (8) and impacts participation in common day-to-day activities, such as going to work, driving, and social interactions (9).
To address the uncertainty associated with epileptic seizures, researchers across many disciplines have spent years investigating the potential for seizure prediction and forecasting (10). The ability to reduce the uncertainty of when a seizure is about to occur would have tremendous implications for quality of life, and clinical management (10). Timely precautions against seizurerelated injury or timed adjustment of treatment according to seizure likelihood (chronotherapy) could also reduce seizurerelated harm, hospitalizations, and healthcare-related costs (11).
Until recently, there was no scientific consensus as to whether seizures would be predictable in a prospective setting since most research was based on limited data [from short-duration inhospital electroencephalography (EEG) recordings] and some presented methodological flaws (12). Access to better quality data [made available in public databases (13,14) and seizure prediction competitions (15)], more rigorous statistical and analytical methods, and results from a clinical trial of an intracranial EEG seizure advisory system [NeuroVista (16)] have shown promise that seizure prediction devices could be possible in the foreseeable future. Additionally, there is a better understanding of the pre-seizure state and of the mechanisms underlying seizure generation (ictogenesis), with contributions from basic science, network theory, multiscale electrophysiological recordings, and functional neuroimaging (17). Multiple patient-specific seizure precipitants have also been identified, including stress (18,19), poor sleep (18), exercise (20), diet (21), weather (22,23), alcohol use (24) and poor drug adherence (25). Many of these factors have shown potential utility in forecasting seizures (18,23).
Yet perhaps the most significant breakthrough for the field of seizure forecasting has been the characterization of shortand long-term seizure occurrence cycles (11,26,27), which typically occur in circadian and multiday (often weekly and monthly) periodicities (27,28). Similar cycles have been reported in interictal epileptiform activity (IEA) (26), EEG markers of brain critical slowing (29) and heart rate (30), all of which have been linked to seizure timing, suggesting that seizures are co-modulated by underlying biological cycles. An individual's seizure cycles can be utilized to generate seizure forecasts using both self-reporting seizure diaries (31)(32)(33) and electrographic seizures (34). However, the discrete nature of seizure events means that the underlying biological cycles may be stronger predictors of seizure occurrence than seizure cycles alone (29,34,35). This has already been successfully demonstrated with cycles of IEA in a retrospective seizure forecasting study using an implanted intracranial EEG device (34). Furthermore, algorithms incorporating biological cycles seem to outperform algorithms using more traditional EEG features, such as spectral power and correlation (15).
However, seizure forecasting algorithms typically rely on chronic EEG recordings from invasive, implanted devices, which require surgery (and associated risks), are costly, and may not be an option for many people with epilepsy. Minimally-invasive or non-invasive wearable devices that monitor continuous biomarkers of seizure risk are, therefore, ideal candidates for most people who desire seizure forecasts (9). Currently, some wearable devices are commercially available for seizure detection (36), although there are also promising results highlighting the utility of wearables in seizure forecasting. Wearable sensors can be used to detect actigraphy, blood volume pulse, body temperature, cerebral oxygen saturation, electrodermal activity and heart rate, all of which have all shown promise in seizure prediction (37)(38)(39)(40). Periodic wearable signals, such as temperature (41) and heart rate (30) may also be used as a biomarker for seizure cycles (35). For example, our recent work in seizure timing and heart rate, measured from a wearable smartwatch, shows that seizures are often phase-locked to underlying circadian and multiday cycles in heart rate (i.e., there is a strong preference for seizures to occur at specific phases of individual-specific heart rate cycles, such as near the peak or trough of a multiday cycle) (30).
To address the need for non-invasive seizure forecasting, this study aimed to develop a wearable device-based seizure forecaster using a long-term dataset from an observational cohort study, Tracking Seizure Cycles. We hypothesized that cycles in heart rate can be leveraged, in addition to other wearable signals (other heart rate features, step count and sleep features), to forecast high and low seizure risk periods. We also investigated the relative contributions of cycles, heart rate, sleep and activity features to forecasting performance.

Study Design
This retrospective and pseudo-prospective feasibility study was designed using training and testing datasets, followed by pseudoprospective evaluation using a held-out dataset. We utilized longterm smartphone seizure diaries and a wearable smartwatch to forecast seizure likelihood and elucidate the relationship between seizures and non-invasively measured wearable signals, namely heart rate, sleep stages, sleep time, and step count. The study was approved by the St Vincent's Hospital Human Research Ethics Committee (HREC 009. 19) and all participants provided written informed consent.

Participants
Adults (18 years and over) with a confirmed epilepsy diagnosis and healthy controls were recruited between August 2019 and January 2021. Participants with epilepsy had uncontrolled or partially controlled seizures and were recruited through neurologist referral. All participants provided written informed consent.

Data Collection
Continuous data were collected via smartphone and wearable devices for at least 6 months and up to 20 months. Participants wore a smartwatch (Fitbit, Fitbit Inc., USA) and manually reported seizure times in a freely available smartphone diary app (Seer App, Seer Medical Pty Ltd, Australia). Participants were instructed to report all their clinically apparent events, including generalized and focal seizures (both aware and unaware). The smartwatch continuously measured participants' heart rates (via photoplethysmography) at 5 s resolution (one recording every 5 s). The smartwatch also estimated sleep stage (awake, REM, and light and deep sleep) and step count each minute.

Training, Testing and Held-Out Evaluation Datasets
Participants were required to have 2 months or more of continuous wearable data recordings, at least 80% adherence (i.e., they must have worn the device at least 80% of the time) and a minimum of 20 seizures reported during the recording time to be eligible for seizure forecasting. Eligible participant demographic information is given in Table 1.
The training dataset included at least 2 months of continuous recordings (M = 5.4 months, SD = 4 months) and at least 15 seizures (M = 35, SD = 47). The patient-specific training cutoff date was the final day that both of these criteria were met. The testing dataset included participants' continuous recordings (M = 6.6 months, SD = 3.1 months) and seizures (M = 87, SD = 112) reported from their training cut-off date until 1 February 2021. As a further requirement for seizure forecasting, participants must have had at least five lead seizures (at least an hour apart in the hourly forecast and at least a day apart in the daily forecast) reported during the testing period. Any continuous recordings (M = 2.6 months, SD = 0.5) and seizures (M = 13, SD = 14) reported from 1 February 2021 until 25 April 2021 were included in the held-out evaluation cohort, so long as the participant reported at least one seizure during this period. This data was held-out to evaluate the performance of the forecasting algorithm in a pseudo-prospective setting.

Data Preprocessing
The heart rate, step count, and sleep signals were all processed separately. Heart rate features included rate of change in heart rate (RCH) and daily resting heart rate (RHR). Physical activity features included steps recorded in the previous hour and steps recorded on the previous day. Sleep features included total time asleep (not including naps), time in REM, time in deep and light sleep during main sleep, average HR overnight, sleep time deviation from median sleep time over the past 3 months, and wake time deviation from median wake time over the past 3 months. All sleep features were calculated using sleep labels derived from Fitbit's sleep algorithm. Additionally, we included cyclic features, comprising heart rate cycles (circadian and multiday), last seizure time, and second-last seizure time. Compared to the hourly forecast, the daily forecast only included multiday cycles, days since last seizure time, days since secondlast seizure time, all sleep features, daily resting heart rate, and steps recorded during the previous day.
To derive heart rate features and heart rate cycles, continuous heart rate signals were initially down-sampled to one timestamp per minute, followed by interpolation of short missing data segments with a linear line (missing segments <2 h) or longer missing data segments with a straight line at the mean heart rate. RCH was used to estimate heart rate variability (HRV), which is defined as the variations in RR intervals and is typically derived using the QRS complex on an electrocardiogram (ECG). RCH was calculated as the mean beats per minute (BPM) in 1 min subtracted from the mean BPM in the previous minute, representing the change in BPM over 2 min. RCH was resampled every hour for the hourly forecast or every day for the daily forecast. Daily RHR was derived as the average of the bottom quintile of BPM where no steps were recorded, thus minimizing the potential for movement artifact.
To compute the heart rate cycles, we used a similar approach to a method used to extract multiday rhythms of epileptic activity (26) [see also (30) for further details]. Briefly, circadian and multiday peak periodicities of heart rate (cycles) were derived using a Morlet wavelet. The heart rate signal was filtered (using a zero-order Butterworth bandpass filter) at the peak periodicities and instantaneous phase of the cycle at each timepoint was estimated using a Hilbert transform. Cycles were used as features for the forecaster if seizures were significantly phase-locked to the cycle [p < 0.05, according to Omnibus/Hodges-Ajne test for circular uniformity (42)]. Each cyclic feature (cycle phases and last/second last seizure time) was transformed into two linear features by normalizing the signal from 0 to 2π and computing the sine and cosine.

Forecasting Algorithm
The seizure forecast was presented in hourly and daily formats to assess the accuracy of an hourly forecast compared to a daily forecast. The hourly forecast gave the likelihood of a seizure at the start of the hour, every hour. The daily forecast gave the likelihood of a seizure for the day, shortly after waking from sleep (based on Fitbit's sleep end time).
To forecast the likelihood of a seizure hourly and daily, we used an ensemble of a long short-term memory (LSTM) neural network (43), a random forest (RF) regressor (43), and a logistic regression (LR) classifier (43). An ensemble method was chosen to allow the combination of diverse feature types. Figure 1 describes the architecture of the model. The training (green), testing (orange) and evaluation (red) cohorts were different Participants that had more than one seizure during the evaluation period are shown in red.
lengths in each participant, and algorithm retraining occurred weekly during testing and evaluation. The forecast used an LSTM model (which contains 7 days of memory) for all sleep features in order to account for the potential effect of built-up sleep debt on seizure risk (18). All other features (cycles, heart rate features and step counts) then predicted seizure risk using a random forest model. The random forest model was chosen because it achieved the best results during testing using Python's sklearn library, when compared to other conventional machine learning models (namely logistic regression, linear discriminant analysis, K-nearest neighbors, naïve bayes and support vector machines). A logistic regression model, which weighs inputs' predictive value, then combined the random forest and LSTM outputs into one seizure risk value per hour or day. This was compared to a rate-matched random model (occasionally referred to as the chance model) using AUC scores. Other metrics were also used to assess forecast performance (see Performance Metrics). The LSTM model was trained on sleep features computed daily after waking. A weekly history of sleep features was incorporated into each row input, providing a 7 × 7 matrix for each forecast, representing 7 days and 7 sleep features per day. The LSTM model was composed of a single layer with 64 memory units, followed by two densely connected layers, and a linear activation function. All networks were trained for 100 epochs. We selected the mean squared error loss function as the cost function, using the Adaptive Moment Estimation (Adam) optimizer (44). The LSTM model outputted the likelihood of a seizure for the day based on sleep features and was used as an input to the LR classifier.
The RF regressors with the bootstrap aggregating technique were trained on all physical activity, heart rate, and cyclic features. In the model, the number of decision trees was 1000 and the minimum number of samples required to be at a leaf node was 120. From observation, these model parameters achieved the highest accuracy across the board during training. Most people, particularly participants with low seizure frequency (<2 seizures/month), had a highly imbalanced dataset, with nonseizure hours/days occurring far more frequently than seizure hours/days. RF models typically made more accurate predictions on balanced datasets, so oversampling of seizure hours/days was undertaken before training the RF model. The output of the RF model was the likelihood of a seizure within the following hour or day and was used as an input to the LR classifier.
The LR classifiers were trained on the outputs of the LSTM and RF models. To aid the classifier in distinguishing between non-seizure hours/days and seizure hours/days and to mitigate the low resoltuion of self-reporting, the hour/day immediately preceding and following the hour/day of each seizure were removed in the training dataset. The output of the LR model was the final likelihood of a seizure (risk value); the risk value was represented as a continuous value between 0 for no seizure and 1 for a "guaranteed" seizure within the next hour or day, as appropriate.
The forecaster classified hours and days as either low, medium, or high risk. The medium and high risk cutoff thresholds were computed using the training dataset by optimizing the metrics:  Retraining the algorithm was implemented to imitate a clinical seizure forecasting device in which algorithm coefficients and risk thresholds would be regularly updated. Retraining of the seizure forecast occurred on a weekly basis as additional data was collected.

Performance Metrics
To assess the performance of the hourly and daily forecasters, a variety of different metrics were used. During algorithm testing and for pseudo-prospective held-out evaluation, performance of the ensembled model was evaluated using the area under the receiver operating characteristic curve (AUC) and compared to the AUC score of a rate-matched (seizure frequency derived from all seizures that occurred in the training dataset) random forecast. The AUC scores assessed the classifier's ability to discriminate between non-seizure hours/days and seizure hours/days. Despite the usefulness of the AUC to measure performance, the AUC can change depending on the forecasting horizon (34); in this case, an hourly forecast compared to a daily forecast. This motivated the use of Calibration Curves (CC) to measure how well the predicted likelihood values corresponded to observed probabilities, and the Brier score (or Brier loss) to quantify the accuracy of the predictions. The CC metric provides a visual representation of the forecaster's ability to estimate seizure risk. The ideal CC can be visualized as a diagonal line, where the forecaster's predicted seizure likelihood values are equal to the actual seizure probabilities. Anything above this line would be considered underestimating seizure risk and anything below would be overestimating seizure risk. The Brier score (or Brier Loss) is shown alongside the CC metric, which is often used to assess calibration performance. For the Brier Score, a perfectly accurate forecast would result in a loss score of 0 and a poorly performing forecast would result in a loss closer to 1. We also considered the accuracy of the forecaster, time spent in low, medium and high risk states, and seizures occurring in low, medium and high risk states.
Analyses were executed using Python (version 3.7.9).

RESULTS
There were 11 out of 39 participants that met the inclusion requirements (see Methods: Study Design and Participants) ( Table 1). Eligible participants had an average duration of 14.6 months (SD = 3.8) of continuous heart rate and activity monitoring, and an average of 423 nights (SD = 112) that recorded sleep stages and duration. Participant diaries included an average of 136 (SD = 123) seizures reported during the wearable monitoring period. Results from the cohort are given in Figures 2-6 and Table 2. Eight of 11 participants (shown in red in Table 1) in the testing cohort were also included in the held-out evaluation cohort, as these people reported more than one seizure during the evaluation period. The results from the prospective evaluation cohort are shown in Figure 7 and Table 2.

Forecast Performance and Metrics
Forecasting performance was quantified to determine which participants would have benefitted from the noninvasive seizure forecast. First, we used the AUC metric to determine forecasting performance. The AUC score quantifies how useful the forecast is, based on the amount of time spent in a high-risk state. An excellent forecast is often considered to have an AUC of >0.9. Of the 11 participants, AUC scores showed that seizures were predicted above chance in all participants using an hourly forecast (M AUC = 0.74, SD = 0.10) and in 10 participants using a daily forecast (M AUC = 0.66, SD = 0.11) (Figure 2 and Table 2). Both hourly and daily models usually performed well in people with longer recording times. A weak positive correlation was found between total recording length and AUC scores in both the hourly (R 2 = 0.63) and daily (R 2 = 0.59) forecasters  Figure 1). This suggests that the forecaster improves over time.  High, medium and low risk states are indicated by the red, orange and green regions, respectively, and are separated by the medium and high risk thresholds. Note that the medium risk and high risk thresholds-indicated by the orange and red lines, respectively-can change after weekly retraining. The cyclical seizure likelihood is mostly attributable to multiday heart rate cycles.

(Supplementary
A relationship was also noticed between seizure frequency and forecasting performance. The model performed worst in the participant with the highest seizure frequency (P6) (0.57 and 0.46 for the hourly and daily forecaster, respectively). P6 had a seizure frequency of 36.7 seizures/month (i.e., more than one per day), which was almost double the next highest participant. Across the whole cohort, a weak negative correlation was found between seizure frequency and AUC scores in both the hourly (R 2 = −0.58) and daily (R 2 = −0.49) forecasters (Supplementary Figure 2). This suggests that participants with lower seizure frequencies (less than once per day) had more accurate predictions using the current model than participants with higher seizure frequencies.
Time spent in high, medium, and low risk, alongside the seizure frequency in high, medium, and low risk, were also considered (Figure 3). For the hourly forecast, median forecast accuracy was 86% (min: 56%, max: 95%) and median time in high risk was 14% (min: 5%, max: 45%). For the daily forecast, median forecast accuracy was 83% (min: 43%, max: 97%) and median time in high risk was 18% (min: 6%, max: 29%). Of the 11 participants, the average time spent in high risk (prediction time) before a seizure occurred was 37 min in the hourly forecast and 3 days in the daily forecast. Typically, greater AUC scores implied that the participant spent more time in low risk and most seizures occurred in high risk. For example, P4 spent only 7% of their time in high risk state, but 83% of their seizures occurred whilst in high risk (see Figure 5 for an example forecast).
Additionally, we evaluated CC metrics and Brier scores (Figure 4). Generally, people with more seizures had calibration curves closer to the ideal diagonal line. Hourly and daily forecasts were occasionally found to sit well below the ideal line, suggesting that seizure risk was overestimated in these cases. Brier score loss, another metric to assess forecast calibration performance, varied independently to calibration curve variation. For example, the participants with the highest seizure counts (P5 and P9) had similar calibration curves for both the hourly and the daily forecast; however, Brier loss scores were much greater for P9 than P5. P4 had the lowest Brier loss scores in both the hourly and daily forecast.

Feature Groups on Forecast Performance
To characterize the importance of feature groups on forecasting performance, we analyzed AUC score change with the addition of particular feature groups (Figure 6). Physical activity and heart rate feature groups added little predictive value to the daily forecaster. Sleep features appeared to add value to the daily forecaster in some people, but this was not significant across FIGURE 6 | Auxiliary contribution of each feature group on forecasting performance in the retrospective testing cohort. AUC score change represents average change computed over ten runs of the algorithm. Performance of each feature group was characterized by comparing the AUC score of the forecasting algorithm once the feature group was added to the AUC score of the forecasting algorithm without the feature group. For example, in the case of physical activity, we compared the AUC score when the algorithm included all feature groups to the AUC score when the algorithm included only heart rate, sleep, and cycles feature groups. *Indicates that the feature group's contribution was significantly greater than zero across the cohort, using a one-sided t-test (***p < 0.001 and **p < 0.01).  the cohort (p = 0.09). Physical activity added some predictive value to the hourly forecaster; however, sleep and heart rate features were the weakest predictors in the hourly forecaster. In both the hourly and daily forecaster, the cycles feature group was the strongest predictor across the whole cohort and for most individuals. 10 of 11 participants (all expect P4) had a significant (i.e., seizures were significantly locked onto the cycle in the training dataset) circadian cycle and 10 of 11 (all except P7) people had least one significant multiday cycle. Despite the occasional negative AUC score change with the addition of a feature group, it is important to note that it is unlikely that there is significant positive or negative value added to the forecaster when values are close to 0.

Held Out Evaluation Cohort Performance
The held-out evaluation cohort performed well in most cases (Figure 7 and Table 2). The predictions (based on AUC scores) were above chance in 7 of 8 (88%) people using the hourly forecaster (M = 0.68, SD = 0.18) and 4 of 8 (50%) people using the daily forecaster (M = 0.58, SD = 0.16). It is important to note that the participant, P7, who did not perform better than chance using the hourly forecast model had the lowest seizure count during the evaluation period and was the only participant without a significant multiday heart rate cycle.

Summary
People with epilepsy and their caregivers have expressed their interest in non-invasive wearable devices for decades, particularly for seizure forecasting (45) and detection (46). Wearable devices are more acceptable to people with epilepsy than invasive, cumbersome or indiscrete devices (45,46). Nonetheless, very few studies have investigated the feasibility of non-invasive wearables in seizure forecasting, and although performance in current studies is promising, their datasets are usually short-term (<1 week) (37,40). This study demonstrates that features recorded via noninvasive wearable sensors can contribute to accurate seizure forecasts. Individual forecasters performed better than chance with all people when an hourly prediction horizon was used, and with 10 of 11 people when a daily prediction horizon was used. These results indicate that non-invasive seizure forecasting is possible for people with epilepsy with seizure warning periods of up to 24 h.
In the evaluation cohort, predictions were above chance in 7 of 8 people using the hourly forecaster and 4 of 8 people using the daily forecaster. This is contrary to what we expected: that the performance would improve with a longer period on which to train the algorithm. The lack of improvement in AUC scores in the evaluation cohort may be attributed to the shorter recording lengths and seizure counts in the evaluation dataset compared to the testing dataset, making it difficult to directly compare the cohorts. Furthermore, the theoretical shift and change that may occur in heart rate cycles over time was not considered in this model. This shift in cycles may be mitigated by consistently retraining the algorithm on a shorter period of data (e.g., the past 4 months, instead of all past data).
Generally, the hourly forecaster resulted in more accurate predictions than the daily forecaster. The superior performance in the hourly forecasts may be attributed to a number of factors, such as the inclusion of circadian heart rate cycles, hourly step count and RCH. The resolution of the daily forecaster would also have played a role in the loss of information. For example, high frequency seizure days (>1 seizure occurred on a day) were weighted equally to low seizure frequency days (1 seizure on a day).

Feature Importance
Overall, cyclic features (heart rate cycles and previous seizure timing) were the strongest predictors of seizures in most cases (Figure 6). Most people had a circadian and at least one multiday cycle that aided prediction of seizure risk. This was expected given recent incredibly strong performance of cycles for seizure forecasting (29,31,34) and the previously demonstrated utility of heart rate cycles as a biomarker for seizure risk (30). Cycles are now becoming increasingly recognized as a fundamental phenomenon of seizure risk; however, their underlying drivers are still unknown. For recent reviews on cycles in seizure forecasting, refer to Stirling et al. (10) and Karoly et al. (47).
Sleep features appeared to be useful predictors of seizure likelihood for some people using the daily forecaster but were weak predictors in the hourly forecaster. The lack of utility of sleep features in the hourly forecaster may be attributed to the design of the algorithm, as the sleep variable remains constant for all hours of the day after waking, making it difficult for the algorithm to distinguish between non-seizure and seizure hours. In contrast, for some people, sleep was a useful feature in the daily forecaster, which distinguishes seizure-days from non-seizure days. This suggests that sleep does play a role in seizure risk for some people. Sleep features, such as sleep quality, transitions and length, have historically been associated with seizures in many people with epilepsy (18,23). It is possible that the role of sleep as a seizure precipitant is highly patient-specific, which warrants further investigation in larger cohorts.
Heart rate features-daily RHR and RCH (estimation of HRV)-were not significantly predictive of seizures on a cohort level, but appeared useful in some individuals (Figure 6). HRV has been of interest to researchers for decades and is known to reflect autonomic function (48). HRV has also been used to predict seizures minutes in advance, albeit with high false prediction rates (39). It is important to note, however, that we have estimated HRV in the current algorithm using a very basic method, but recent studies have revealed that photoplethysmography-based methods for estimating HRV are available and in the pipeline for wearable devices (49, 50). Daily resting heart rate, on the other hand, is not often associated with seizure risk, but seemed to be a useful feature in some cases. However, daily resting heart rate is likely correlated with multiday rhythms of heart rate and thus may not provide distinct value compared to cyclic features that were derived from heart rate.
Physical activity features were also predictive of seizures in some people, namely in the hourly forecaster (Figure 6). Physical activity is beneficial for mental health, quality of life, and cognitive function for people with epilepsy (51). However, people with epilepsy are less likely to engage in physical activity than the general population (52), partially influenced by the inaccurate historical belief that exercise can provoke seizures (53). On the contrary, there is some evidence that increased physical activity is associated with reduced seizure frequency (54,55). Physical activity is also known to benefit common psychiatric comorbidities of epilepsy, such as anxiety and depression (56), so exercise may indirectly reduce seizure frequency by impacting other seizure precipitants, such as stress and reduced heart rate. We did not explore whether the relationship between physical activity was generally positive or negative in this study, but this should be investigated in future work.
Note that the relative feature contributions found in this work may depend on the specific choice of model and may be taken as an indication of feature importance only. Future work may focus on more rigorous methods for feature importance (57).

Demographic and Clinical Factors
We generally observed that the model performed best for participants with longer recording times, and more consistently over prediction horizons (Supplementary Figure 1). This suggests that seizure forecasts utilizing wearable sensors perform better with longer recording times, and are likely to improve over time. We suggest that a clinical forecast requires a minimum amount of data or events before starting to use the forecaster. Future work should investigate the ideal number of events required for the best results, taking into account an individual's seizure frequency, and the optimal number of cycles to observe before incorporating the cycle into the forecaster. Interestingly, the model tended to perform better in participants with lower seizure frequencies (Supplementary Figure 2). This relationship between seizure frequency and forecasting performance was also observed in a prospective forecasting study (16). Although it is well-known that seizure frequency is important to quality of life (58), people with fewer seizures are still subject to anxiety and fear caused by the unpredictability of seizures (8,59). Therefore, people who have fewer seizures may have the most to benefit from accurate forecaster, as a forecaster may enable them to go about their daily lives without fear of an impending seizure.
Despite less than perfect accuracy in the current model, the results may still meet the user requirements for a practical seizure gauge device. Many people with epilepsy may use a forecasting device despite less than perfect accuracy (45). For example, subjects in a prospective seizure forecasting study found the implanted device useful even though the median sensitivity was only 60% (60). Moreover, shorter time horizons (minutes to hours) seem to be preferable over longer time horizons (days) (45). This is in line with the current results, where the shorter time horizon (hourly) made more accurate predictions than the longer time horizon (daily). Ultimately, prospective seizure forecasting studies with non-invasive wearables are needed to assess user requirements and clinical utility.

Limitations
This study has several limitations. First, self-reported seizure diaries have inherent drawbacks and are known to be inaccurate (61). Not everyone with epilepsy is aware of when they experience a seizure, particularly if they predominantly experience focal aware seizures. Self-reported events also rely on participant or caregiver memory for seizure time recollection, which may cause the forecaster to draw inaccurate conclusions during training. However, self-reported events are non-invasive, easy to capture, and remain the standard data source for medical practice and clinical trials in epilepsy (10). Therefore, seizure diaries remain important for non-invasive seizure forecasting. To improve the accuracy of self-reported events, non-invasive seizure detection devices are available for convulsive seizures, and detection of non-convulsive seizures are in the pipeline (62).
Second, it is worth noting that the accuracy of heart rate and sleep stages measured from smartwatch devices has been investigated compared to electrocardiography and polysomnography, respectively (63)(64)(65). These studies collectively show that no significant difference was noted between the heart rate captured using a Fitbit compared to an electrocardiography device during sleep, but some errors did emerge during exercise. Smartwatches are known to be useful in obtaining gross estimates of sleep parameters and heart rate but are not yet suitable substitutes for electrocardiography and polysomnography. This suggests that complex parameters, such as sleep stages and heart rate variability, may need further investigation to understand their role as seizure drivers. Wearable heart rate sensors are also subject to artifacts, although measurement noise was likely to be at a higher frequency than the time scale focused on in the current work.
Third, seizure number and seizure frequency are also limiting factors on whether seizure forecasting is possible. When seizure numbers are low, the forecaster may be unreliable in some cases due to overfitting in the training set. The optimal learning period based on seizure frequency should be investigated in future. Fourth, the ensemble method was complicated because we combined diverse feature types; however, given the main contribution to performance was cyclic features, future work should focus on developing simpler approaches. Cycles may also shift or change over time, thus affecting the accuracy of the forecaster. In a real-world implementation, we may look to remove any past data beyond 1 year or remove the oldest week of data every time a new week is added to account for changes in seizure biomarkers and to reduce memory requirements.
Finally, we attempted to balance our participant recruitment so that it accurately reflected the population of people with refractory epilepsy (variety of adult ages, epilepsy types and seizure frequencies); however, the limited number of participants in this study means that the population may not have been accurately represented in the sample, particularly for people with generalized epilepsy. We also endeavor to explore the relationship between forecasting accuracy and epilepsy type in the future.

CONCLUSION
We assessed the utility of electronic self-reported seizure diaries and non-invasive wearable physiological sensor data to estimate seizure risk in retrospective and pseudo-prospective cohorts. This research has shown that non-invasive wearable sensors in the field of seizure forecasting is not only possible, but feasible and imminent. Prospective analysis and clinical trials should also be undertaken on longitudinal datasets in the future.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by St Vincent's Hospital Human Research Ethics Committee (HREC 009.19). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
RS, DG, WD'S, MC, DF, and PK conceived of the presented idea. DF and PK collected the data. RS performed the computations and wrote the manuscript with support from PV, DG, WD'S, MC, and PK. TP, PV, and DP verified the analytical methods, with support and supervision from BB and MR. All authors contributed to the article and approved the submitted version.

FUNDING
This research was funded by an NHMRC Investigator Grant 1178220, the Epilepsy Foundation of America's My Seizure Gauge grant and the BioMedTech Horizons 3 program, an initiative of MTPConnect. The funders were not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.