Fine tuned personalized machine learning models to detect insomnia risk based on data from a smart bed platform

Introduction Insomnia causes serious adverse health effects and is estimated to affect 10–30% of the worldwide population. This study leverages personalized fine-tuned machine learning algorithms to detect insomnia risk based on questionnaire and longitudinal objective sleep data collected by a smart bed platform. Methods Users of the Sleep Number smart bed were invited to participate in an IRB approved study which required them to respond to four questionnaires (which included the Insomnia Severity Index; ISI) administered 6 weeks apart from each other in the period from November 2021 to March 2022. For 1,489 participants who completed at least 3 questionnaires, objective data (which includes sleep/wake and cardio-respiratory metrics) collected by the platform were queried for analysis. An incremental, passive-aggressive machine learning model was used to detect insomnia risk which was defined by the ISI exceeding a given threshold. Three ISI thresholds (8, 10, and 15) were considered. The incremental model is advantageous because it allows personalized fine-tuning by adding individual training data to a generic model. Results The generic model, without personalizing, resulted in an area under the receiving-operating curve (AUC) of about 0.5 for each ISI threshold. The personalized fine-tuning with the data of just five sleep sessions from the individual for whom the model is being personalized resulted in AUCs exceeding 0.8 for all ISI thresholds. Interestingly, no further AUC enhancements resulted by adding personalized data exceeding ten sessions. Discussion These are encouraging results motivating further investigation into the application of personalized fine tuning machine learning to detect insomnia risk based on longitudinal sleep data and the extension of this paradigm to sleep medicine.

Introduction: Insomnia causes serious adverse health e ects and is estimated to a ect -% of the worldwide population.This study leverages personalized fine-tuned machine learning algorithms to detect insomnia risk based on questionnaire and longitudinal objective sleep data collected by a smart bed platform.
Methods: Users of the Sleep Number smart bed were invited to participate in an IRB approved study which required them to respond to four questionnaires (which included the Insomnia Severity Index; ISI) administered weeks apart from each other in the period from November to March .For , participants who completed at least questionnaires, objective data (which includes sleep/wake and cardio-respiratory metrics) collected by the platform were queried for analysis.An incremental, passive-aggressive machine learning model was used to detect insomnia risk which was defined by the ISI exceeding a given threshold.Three ISI thresholds ( , , and ) were considered.The incremental model is advantageous because it allows personalized fine-tuning by adding individual training data to a generic model.

Results:
The generic model, without personalizing, resulted in an area under the receiving-operating curve (AUC) of about .for each ISI threshold.The personalized fine-tuning with the data of just five sleep sessions from the individual for whom the model is being personalized resulted in AUCs exceeding .for all ISI thresholds.Interestingly, no further AUC enhancements resulted by adding personalized data exceeding ten sessions.

Introduction
Insomnia is a highly prevalent sleep disorder, affecting 10-30% of the general population (1), which is characterized by difficulty with sleep initiation, weakened sleep maintenance, and/or waking-up too early (1).Insomnia can cause significant distress for those who experience symptoms and has been bidirectionally associated with adverse health consequences such as heart disease, elevated blood pressure, neurological conditions, chronic pain, gastrointestinal problems (2), depression, and anxiety (2).Insomnia can be intermittent, i.e., it is interspersed with occasional good rebound nights.This can give the patient a false sense of remission which may cause low reporting of insomnia to the healthcare system.
Despite its high prevalence, insomnia is underrecognized, underdiagnosed, and undertreated (3).Latest progress in machine learning and the use of consumer sleep technologies may be  than a month), subacute (1 to 3 months), and chronic insomnia (longer than 3 months) (5).The Insomnia Severity Index (ISI) is the only instrument currently in use that allows for severity classification depending on a numerical score (6).The ISI has not yet been validated to identify a specific insomnia phenotype, but the identification of insomnia risk can be defined as the ISI exceeding a threshold (7).
Table 1 summarizes some of the approaches in the state-ofthe-art to detect insomnia.13) used EEG data and Support Vector Machines, achieving a high F1 score (0.88) in predicting primary insomnia.
Among the various consumer sleep technologies, it is reasonable to assume that "nearables" which do not require the   To quantify insomnia risk, we focused on the ISI which is a seven-question instrument designed to assess the severity of both daytime and nighttime components of insomnia.The responses to the 7 ISI questions in a scale from 0 to 4, are added up to obtain a total score which indicates, no clinical significant insomnia if the score is lower than 8, subthreshold insomnia if the score is between 8 and 14, clinical insomnia if the score is between 15 and 21, and severe clinical insomnia if the score is between 22 and 28 (6).For convenience, the total ISI score is simply referred to as ISI in the rest of the paper.Sleep session data include (see Table 2) the session duration which corresponds to time in bed, the number of bed exits, sleep duration, duration of restful sleep (which was detected based on the level of motion), time to fall asleep (TTFA) once the participant entered the bed, the percentage of time with high (above a given threshold) level motion, sleep quality score, sleep debt which is the difference (if positive) between and individual's sleep duration goal minus their actual sleep duration, sleep regularity which characterizes the probability of an individual of being awake or asleep at any given two points in time separated 24 h apart [using an adaptation of the procedure presented in Lunsford-Avery et al. (20)], and mean cardiorespiratory metrics such as respiratory rate, heart rate, and heart rate variability.The feature vector used to train the machine learning model has 14 components listed in Table 2 (see also Figure 3).

. Data inclusion procedure
On a daily basis, the smart bed consolidated sleep sessions whose end and begin times were not separated by more than two hours.Sleep sessions separated by more than two hours were considered as individual sessions.For each day, only the longest sleep session was kept for analysis.
Starting from 5,444 enrolled participants, the number of respondents to questionnaires 1 to 4 were 3,729, 3,743, 3,596, and 3,273 respectively.The number of participants that responded to at

. Insomnia risk quantification and labeling of each sleep session
To detect insomnia risk, three thresholds on the ISI (8, 10, and 15) were considered.As mentioned in Section 2.1, the thresholds 8 and 15 distinguish no insomnia vs. any level of insomnia and nonsevere insomnia vs. severe insomnia respectively.The ISI threshold of 10 was used by Oh et al. (7) to quantify insomnia risk.
Each sleep session had an ISI value assigned according to the following criteria (see also Figure 2).For each session before the second questionnaire, the ISI is that of the first questionnaire.If the response to the first questionnaire is missing, then all sessions before the second session have the ISI value of the second questionnaire.For each sleep session after the last questionnaire answered by the participant, the score is the ISI of the last questionnaire.In-between, the sleep sessions between the n-th questionnaire and the .

Personalized fine tuning
In machine learning, the idea of improving a model by transferring information from a related domain is referred to as transfer learning (21).A related concept is that of fine tuning where a generic model is incrementally trained to optimally perform in specific scenarios.The incremental training uses a small amount of training data from the targeted specific scenario.
We leveraged the transfer learning idea along with the leaveone-subject-out cross-validation (LOOCV) technique where the data from all but one subject are used to train a model which is tested on the data from the left-out subject.For each of the 1480 subjects in our dataset, we trained a generic passive-aggressive model (see Section 2.6) using the data from all other subjects, and we personalized the model using sleep session data from 1, 5, 10, 20, 30, 40, 50, and 60 days of the left-out subject (see also Figure 3).This is illustrated in Figure 4.The rest of the data from the left-out subject was used to evaluate the model performance.

. Passive-aggressive learning
This is a binary online learning algorithm that makes predictions based on the error function's gradient, allowing it to adjust its predictions as new data are introduced (22).The passiveclassifier updates its parameters incrementally and at the individual training sample level rather than at a batch (updates parameters after exposure to a fixed set of training samples) or epoch level (updates model parameters after a full pass over the entire training dataset).This makes the passive-aggressive approach ideal for the implementation of the personalized finetuning strategy described in the previous section.This classifier is passive in that it does not update its parameters when training samples are correctly classified and is aggressive in that it does update when incorrectly classifying training samples (22).
The passive-aggressive classifier has several hyperparameters that can be adjusted to adjust its performance.In our implementation, we used a hinge loss which is zero for correct classifications and in case of misclassification increases proportionally to the distance from the sample to the decision boundary.The proportionality hyper-parameter controls the degree of aggressiveness in the updates to the decision boundary in the face of misclassification.

. Performance metrics
We evaluated the performance of the personalized model using accuracy (Equation 1), precision (Equation 2), recall (Equation 3), and F1 score (Equation 4).In Equations 1-4, TP and TN represent the number of true positives and true negatives respectively.For each ISI threshold and personalization interval, we computed the average and standard deviation for each metric.As it is usually done with binary classifiers (23), we have also calculated the area under the receiving-operator curve (AUC) which characterizes the trade-off between true positive and false positive rate.

Results
The demographic information for the final dataset of 1,489 respondents is reported in Table 3.In addition, the mean ISI values per questionnaire are also reported.
The model's performance without personalization, i.e., the duration of the personalization interval is zero, serves as baseline for comparison.The metrics for all ISI thresholds and personalization interval are reported in Table 4. Figure 5 shows the mean AUC for each ISI threshold and personalization interval.
The incremental AUC (iAUC) values for each ISI threshold are shown in Table 5 and Figure 6.These emphasize AUC improvements associated with the increase in the personalization interval.Improvements can already be observed when the personalization interval increases from 0 to 1 day, highlighting the immediate impact of incorporating even minimal personalized data into the model.Following the initial improvement, the iAUC values tend to diminish with some negative values recorded.This "diminishing return" trend suggests that personalization continues to contribute positively to the model's performance, the marginal gains decrease as more personalization data are incorporated.
The difference in iAUC for all possible pairs of ISI threshold was also statistically evaluated.The statistical significance of these differences are shown in Table 6.

Discussion
Our results suggest that significant accuracy improvement can be achieved by integrating longitudinal individual-specific data into an insomnia risk detection model.Such improvement may be due to the fact that insomnia symptoms impact sleep in an individualized manner.Indeed, the results across different personalization intervals and ISI thresholds show the difficulties of predicting insomnia risk; with near random results for a generalized model that does not account for individual differences.Even a modest amount of personalization was already sufficient to increase the AUC by 0.3 which represented a 60% improvement over the generic model which provided quasi-random results.
We could also observe that the AUC (Figure 5) exhibits a slight degradation for approximately 30 days of personalization data.To understand whether this degradation is intrinsic to our model, we performed a test consisting in randomizing the data.In this manner, the chronologic information is no longer present in the data and if the degradation persists, then the specific machine learning algorithm would have caused that.The outcome of this experiment is shown in Figure 7.The fact that no AUC degradation can be observed in Figure 7 suggests that the decrease in AUC observed in Figure 5 may be due to the properties of the data.A plausible explanation for this degradation may be the proximity to the second questionnaire.However, no degradation could be observed for dates that are in the vicinity of the dates for the second or fourth questionnaires.
We considered three ISI thresholds in this research.The results in Table 4, Figures 5, 6 show similar trends for all considered thresholds.We performed a statistical comparison between the iAUC curves for all possible pairs of ISI thresholds (see Table 6).We did not find any statistically significant difference between any of the comparisons which suggests that there could be an equivalence in detecting insomnia risk by considering any of the three ISI thresholds we tested in this research.An appropriate threshold for insomnia risk is 10 which coincides with the choice by Oh et al. (7) and may better reflect the high prevalence of insomnia.
Our study has some limitations which are listed below.
• The population drawn from Sleep Number customers is not representative of the broader US population.This is reflected by the relatively older age of respondents reported in Table 3.Thus, the results reported in this research and the relevance of model personalization may not apply to the general population.• The analysis reported in this research based on ISI threshold to reflect insomnia risk does not permit to identify a specific insomnia phenotype or the presence of comorbid sleep disorders such as sleep disordered breathing or restless leg syndrome.Comorbid conditions can influence the ISI and the features we consider in our model such as heart rate variability, heart rate, breathing rate, sleep quality, and sleep debt.• Self-reporting insomnia and the electronic delivery cannot be considered as equivalent to diagnostics.Indeed, the respondent engagement and interaction with the electronic delivery method may be lower compared to in-clinic, and in-person questionnaire administration.• The responses to multiple delivery of the same questionnaire even if done multiple weeks apart, may not necessarily be independent.• While the smart bed has a pressure sensor for each sleeper on the bed, the nature of BCG is such that some minimal contribution of the signal produced by one bed user can reflect on that from the bed partner.
An opportunity to expand this research consists in considering insomnia phenotypes such as difficulty of falling asleep but normal sleep duration or normal sleep latency but difficulties of staying asleep.Indeed, the advantage of personalization may apply to insomnia phenotypes which could be easier to apply at a scale instead of individual level.An additional area for expansion is the prediction of insomnia over shorter intervals to enable detection of acute insomnia which if not treated early enough can convert into chronic insomnia.
The combination of longitudinally and unobtrusively acquired sleep data with personalized machine learning models constitutes a paradigm that may be generalized across sleep medicine from early detection, endotype, and phenotype identification to enable treatment optimization, and recovery monitoring.This research presents early encouraging results supporting that vision.
FIGUREExample of respondent ISI score assignment to sleep sessions.

FIGURE
FIGURE Overview of data collection and model development.BCG, Ballistocardiography; BR, Breathing rate; HR, Heart rate; HRV, Heart rate variability; SRI, Sleep regularity index; TTFA, Time to fall asleep; PAC, Passive agressive classifier; LOOCV, Leave-one-out cross-validation.
which consisted in responding to four electronically delivered questionnaires and allowing the use of objective sleep data collected by the smart bed platform.The four questionnaires were presented to the enrolled participants on November 22, 2021, January 3, 2022, February 14, 2022, and March 28, 2022 respectively.Each questionnaire was active for two weeks.The objective sleep data were collected between October 21, 2021 and March 31, 2022.Demographic information including age and gender were collected in the first questionnaire.Each questionnaire was composed of five validated instruments, insomnia severity index (6), Epworth sleepiness scale (ESS) (15), reduced morningnesseveningness questionnaire (16), general anxiety disorder GAD-7 (17), and the patient health questionnaire PHQ-8 (18).The ISI and ESS were administered under a utilization license provided by Mapi Research Trust.

FIGURE
FIGUREArea under the receiving-operator curves (AUC) vs. personalization interval for each ISI threshold.

FIGURE
FIGUREIncremental AUC vs. personalization interval for each ISI threshold.

FIGURE
FIGUREExperiment with chronologically randomized data.Incremental AUC vs. personalization interval for each ISI threshold.

TABLE State -
of-the-art of machine learning algorithms applied to insomnia.

TABLE Respondent
statistics & ISI scores.Respondents , Male/Female/Other 669/811/9 Age 51.72 ± 12.77 .Questionnaire procedure Individuals enrolled in the study are owners of a Sleep Number smart bed who consented to participate in an IRB approved study TABLE Results for each ISI threshold and personalization interval duration.Sleep session data are collected on a daily basis by the smart bed using the technology and algorithms described in Siyahjani et al. (19).The smart bed, validated against polysomnography (19), uses a pressure sensor to capture high-resolution full body ballistocardiography to accurately measure breathing rate, heart rate and movements to derive session data.The smart bed uses a pressure sensor for each sleeper on the bed.
TABLE Statistical testing for iAUC values.