The Proposition for Bipolar Depression Forecasting Based on Wearable Data Collection

Bipolar depression is treated wrongly as unipolar depression, on average, for 8 years. It is shown that this mismedication affects the occurrence of a manic episode and aggravates the overall condition of patients with bipolar depression. Significant effort was invested in early detection of depression and forecasting of responses to certain therapeutic approaches using a combination of features extracted from standard and online testing, wearables monitoring, and machine learning. In the case of unipolar depression, this approach yielded evidence that this data-based computational psychiatry approach would be helpful in clinical practice. Following a similar pipeline, we examined the usefulness of this approach to foresee a manic episode in bipolar depression, so that clinicians and family of the patient can help patient navigate through the time of crisis. Our projects combined the results from self-reported daily questionnaires, the data obtained from smart watches, and the data from regular reports from standard psychiatric interviews to feed various machine learning models to predict a crisis in bipolar depression. Contrary to satisfactory predictions in unipolar depression, we found that bipolar depression, having more complex dynamics, requires personalized approach. A previous work on physiological complexity (complex variability) suggests that an inclusion of electrophysiological data, properly quantified, might lead to better solutions, as shown in other projects of our group concerning unipolar depression. Here, we make a comparison of previously performed research in a methodological sense, revisiting and additionally interpreting our own results showing that the methodological approach to mania forecasting may be modified to provide an accurate prediction in bipolar depression.

Bipolar depression is treated wrongly as unipolar depression, on average, for 8 years. It is shown that this mismedication affects the occurrence of a manic episode and aggravates the overall condition of patients with bipolar depression. Significant effort was invested in early detection of depression and forecasting of responses to certain therapeutic approaches using a combination of features extracted from standard and online testing, wearables monitoring, and machine learning. In the case of unipolar depression, this approach yielded evidence that this data-based computational psychiatry approach would be helpful in clinical practice. Following a similar pipeline, we examined the usefulness of this approach to foresee a manic episode in bipolar depression, so that clinicians and family of the patient can help patient navigate through the time of crisis. Our projects combined the results from self-reported daily questionnaires, the data obtained from smart watches, and the data from regular reports from standard psychiatric interviews to feed various machine learning models to predict a crisis in bipolar depression. Contrary to satisfactory predictions in unipolar depression, we found that bipolar depression, having more complex dynamics, requires personalized approach. A previous work on physiological complexity (complex variability) suggests that an inclusion of electrophysiological data, properly quantified, might lead to better solutions, as shown in other projects of our group concerning unipolar depression. Here, we make a comparison of previously performed research in a methodological sense, revisiting and additionally interpreting our own results showing that the methodological approach to mania forecasting may be modified to provide an accurate prediction in bipolar depression.

INTRODUCTION
Those who suffer from bipolar depressive disorder (BDD) are often misdiagnosed with unipolar depression and treated as such in average for 8 years (Singh and Rajput, 2006;Lloyd et al., 2011). In addition, there are findings suggesting that antidepressant medication can aggravate their condition (Patel et al., 2015;Robillard et al., 2021). Bipolar disorder in its various forms affects 2.4% of the population of the world (Merikangas et al., 2011;World Health Organization [WHO], 2017). It is a recurrent mood disorder that produces everything from extreme euphoria to severe depression. It is accompanied by alterations in thought and behavior and can produce psychotic symptoms, such as delusions and hallucinations. People who suffer from it have a high risk of suicide, 20 times more than general population (Baldessarini et al., 2020). Even with treatment, more than a third of patients will suffer at least one relapse in the first year after diagnosis and more than 60% will have a new crisis in the first 2 years. It is a disease that typically appears during adolescence or early adulthood, affecting the person throughout his/her entire life (World Health Organization [WHO], 2018). Pharmacological treatment is the main pillar in the approach to this debilitating disease. It aims to shorten crises and prevent their occurrence but the medication has serious side effects, especially at high doses. It is therefore particularly important to detect the onset of a crisis as soon as possible. Rapid treatment of a new crisis can make a big difference in the overall effectiveness. However, this early detection is very difficult from a current standardized clinical approach. At the beginning of a crisis, the symptoms and changes can be very subtle, almost impossible to notice. It is very challenging to differentiate between unipolar and bipolar depression. We showed that the detection of unipolar depression is possible by combination of machine learning and non-linear characterization of electroencephalographic (EEG) signals (Čukić et al., 2020a,b,c;Čukić and Lopez, 2020). Additionally, we demonstrated that with the same methodological approach, it is possible to differentiate between two phases of the disease, episode, and remission (Čukić et al., 2019), which can have immense significance for clinical decisions.
A common denominator at the onset of crises is the change in sleep and activity pattern. Weeks before the crisis, there are always changes in these variables (Llamocca et al., 2021). The early detection of these changes would allow for the improved possibility of social and occupational integration of the patients and would also allow for the decrease of the dose of drug needed for stabilization.
In our previous work, we dealt with prediction of the occurrence of crisis in BDD based on actigraphy measurements combined with standard reports from psychiatrists and selfreport data obtained from outpatients via a mobile application (Llamocca et al., 2018(Llamocca et al., , 2019(Llamocca et al., , 2021. We used a number of methods for feature selection and a number of machine learning models that were previously applied in similar detection tasks (Llamocca et al., 2021). In conclusion, we stated that this methodology led to a real precision medicine application. It was shown that non-linear analysis of electrophysiological data could be used for monitoring state of patients with bipolar depression (Pincus, 2006;Migliorini et al., 2012;Moon et al., 2013;Nardelli et al., 2017;Byun et al., 2019). Spectral and non-linear biomarkers extracted from ECG are corresponding to the aberrations of the autonomous nervous system (ANS) of patients, but also to the severity of the disease. The relation between variability of heart rate (VHR) and depression is well described (Kemp et al., 2010(Kemp et al., , 2012. Based on the non-linear analysis of ECG (as a robust marker of vagal control), it is possible to differentiate between comorbid disorders (Kemp et al., 2012) or subtypes of depression (Kemp et al., 2014), and to point to the unreported suicidal ideation (Khandoker et al., 2017), an information of enormous significance for accurate diagnosis and effective treatment. We argue here that electrophysiological data (ECG measured by portable monitoring device) as a source of detection and forecast, properly characterized by non-linear measures, can be a game changer. We revisited and additionally interpret some of our already published data, important for developing an accurate warning system for the proximity of the crisis, allowing timely and appropriate action.

COMPARATIVE ANALYSIS AND DISCUSSION
Our main aim in the most recent publication was to isolate relevant variables for BDD (irritability and duration of sleep turned out to be the most significant) and discover the relations between them (Llamocca et al., 2021). Being successful in the detection of unipolar depression states/phases, we applied the same method to BDD and revealed quite different dynamics of the disease with more phases than in unipolar depression (Llamocca et al., 2021). According to our results (based on accumulated clinical observations and advanced analytics), there are five distinct states with as many intermediary (bidirectional) states in BDD dynamics, described by directed graph approach (for more details of our methodology, please consult the original publication, Llamocca et al., 2021). We could not discuss all aspects of our results, due to the scope and the limitations of the journal. In this retrospective analysis together with additional interpretation of those results, we are discussing suggestions for improvement of the future methodology that might lead to a simpler solution, more attractive to clinicians. Due to very complex dynamics of bipolar depression, the personal analysis of every single case is still required, as in the classical personalized approach (Llamocca et al., 2021). Other research aiming at forecasting for BDD, also concluded that the time series extracted from similarly collected data are not possible to generalize since they are very heterogenous; this is actually preventing the automated mood forecasting in BDD (Moore et al., 2012). Moore and colleagues reported that for some patients the mania scores were always zero during the monitoring period, which is probably the effect of medication. Figure 1 shows the periods for defined states (depression, euthymia, manic, or mixed) in which some patients were, as well as the evolution of self-report variable D irritability and the actigraph variable S sleep efficiency. For detailed definitions of states, please consult original publication (Llamocca et al., 2021). From Figure 1, we can see different dynamics in four different patients; P03 exhibited mania and mixed state, P04 experienced euthymia and mixed state, P06 exhibited all possible states in the same period, while P09 was in the phases of long euthymia and mania, with a brief phase of depression. Although these four persons are all diagnosed with the same clinical entity, it is difficult to compare their dynamics as they are so different. Knowing that the mood (or states, as we labeled them) is the outcome of many complex physiological processes (that generate series of sequential data), the problem of forecasting seems to be more complicated than previously thought [in various artificial intelligence (AI) applications]. Addition of physiological complexity (fractal and non-linear) analysis to this methodology, based on our interpretation coming from Information theory, may improve the characterization of their states leading to better crisis prediction.
One of the first authors to write about the quantitative assessment strategies in mood disorders, Steven M. Pincus, introduced a novel understanding of physiological complexity, based on his rich experience with deciphering hormonal dynamics (Pincus et al., 1996). Pincus argues that we should pay closer attention to time series that reflect essential physiological information, for there is very important history of the data, i.e., the order of samples in the time series. Pincus is the author of the Approximate Entropy algorithm (ApEn), which is a model-independent quantification of the regularity (complexity) of the data (Pincus, 1991(Pincus, , 1995Pincus and Huang, 1992;Pincus and Viscarello, 1992). The fundamental difference between regularity statistics (such as, ApEn) and conventional variability measures is that the conventional approach is focusing on tasks of quantifying the degree of spread about the central value, while the order of the input data is irrelevant; whereas in irregularity statistics, ApEn tracks changes from random to very regular and the order of samples is essential to the algorithm (Pincus, 2003). If we shuffle the data, the intrinsic dynamics is lost, since time series reflect the essential physiological information (Pincus, 1994). Since the sequential order of mood data is relevant to diagnosis, we must use something beyond SD and means currently used in medicine, to adequately quantify the serial nature of those data (Pincus, 2003). Since ApEn and other similar entropy measures (Shannon entropy, sample entropy, multiscale entropy, etc.) started gathering attention, various research results confirmed that they are indispensable for detecting the slightest changes in the complex physiological systems, that cannot be discovered by conventional methods. In our most recent research, we detected that among those entropy-based measures, Shannon entropy yields the best result, overperforming any previously reported conventional heart rate variability (HRV) analysis (Čukić and Savić, 2021). That has sense since Shannon entropy reflects the amount of information generated by the signal (process), which can lead to discerning the system (and its states) that is functioning in a different way than the healthy one (Vajapeyam, 2014). ApEn can detect subclinical changes (the patterns that mostly remain undetected), unlike conventional time series analyses (Pincus et al., 1993;Pincus and Goldberger, 1994). In addition, Pincus advises a combined approach of nonlinear analysis of atypical heart rate (HR) dynamics and/or EEG, given the hereditary nature of bipolar disorder (Pincus, 2003(Pincus, , 2006. ApEn showed to be capable of detecting changes not in peaks or amplitudes, but in underlying episodic behavior, corresponding to subsystem anatomy, feedback, or coupling (Pincus and Keefe, 1992;Pincus, 1994). It can be therefore useful to predict subsequent clinical changes, such as in mood disorders. Cook used this kind of quantification (irregularity statistics) (Cook et al., 2002) to show that patients with bipolar depression exhibit changes in EEG as a reaction to antidepressant therapy. Glenn and her colleagues managed to distinguish an episode of mania or depression, in 49 patients with bipolar disorder, from the 60 days of prior euthymia, 60 days prior the change by using ApEn algorithm on time series of self-reported state (Glenn et al., 2006). Their research showed that the larger ApEn value suggests that the 60 days prior to manic episode are more disordered (irregular) than the 60 days prior to a depressive episode. They argued that non-linear and linear techniques of analysis may measure different underlying components of mood changes capturing patterns that are embedded in the order of the Frontiers in Physiology | www.frontiersin.org data. Their research suggested that non-linear techniques should complement traditional measures to better delineate the onset (and extent) of an episode, preventing the costly hospitalization, but also the recovery from the crisis. Moore et al. (2012) noted that the quality that seems to vary among the patients with BDD they observed is so-called roughness, which they addressed by application of Detrended Fluctuation Analysis (DFA), a fractal methodology belonging to the family of non-linear methods of analysis. Another important study by Migliorini et al. (2012), used portable ECG sensor embedded in T-shirts, so the patients could sleep without restraints while the constant monitoring of heart dynamics was performed. The rationale here is that underlying aberrated dynamics of ANS or cortico-vagal control (known to be disrupted in mood disorders (Rothenberg, 2007), could be used for the detection and forecasting. Again, nonlinear measures showed to be superior to conventional ones (see also Gottschalk et al., 1995), and the ratings extracted from signals recorded during the whole 4 nights were more accurate than the result of the standard diagnostic procedure performed before sleep (they used ML models to differentiate between BDD and healthy controls). Faurholt-Jepsen et al. (2014) showed that self-reported (labeled "subjective") assessment was more efficient in identifying BDD states, as in using mobile technology (smartphones) or online platforms, such as Mechanical Turk (Gillan and Whelan, 2017). Hence, some electrophysiological recording could significantly improve the chances of BDD mania prediction. Irregularity data would probably act as much more reliable features accurately representing underlying physiological information leading to better predictions. It is also important to distinguish between detection and forecasting since the latter is a much more demanding task. Having in mind that the symptoms of mood disorders are the consequences of cortico-vagal control or better, the lack of it (Rothenberg, 2007;Willner et al., 2013;Van der Kolk, 2014), non-linear measures as indicators of intrinsic dynamics (provided their sensitive quantification power) of the system are the optimal choice. Kim et al. (2013) showed that in bipolar depression, based on network analysis of EEG, there is an underlying disruption of functional connectivity.
Here, we propose two classes of methodology improvements that can result in more feasible solution for forecasting of manic episodes.
The first one is to add to the method the recording of ECG from the patients with BDD, with portable monitoring devices with medical-grade quality of signal. There are plenty of solutions, such as recording from the fingers, or from the wrists; to perform sufficiently accurate analysis, the recording from the chest is required. The signal should be analyzed by some of the abovementioned non-linear methods, irregularity statistics (entropy-based) and some form of fractal analysis. This kind of characterization of signal would eventually lead to much better prediction. The aim is to connect the values of certain measures/variables to certain diagnostic entities and their phases.
We are proposing recording of portable ECG, and not EEG (that was used for many EEG based depression detection in literature, among others, Alimardani and Boostani, 2018), aware of the problems in acquisition of the signal that can jeopardize the whole project. Telemedicine [with internet of things (IoT)] is gradually entering homes; outpatients are already using mobile applications, and the collection of data is easier than before. It is already shown that non-linear measures of ECG make it possible to differentiate between comorbid disorders (as shown in refs. Kemp et al., 2010Kemp et al., , 2011Kemp, 2011), to delineate melancholic from non-melancholic depression (Kemp et al., 2014), or to detect the suicide ideation (Khandoker et al., 2017), which is particularly important in BDD where the risk of suicide is high (20-fold risk in comparison with controls, Baldessarini et al., 2020). Those are all immensely important for the clinician to make effective treatment decisions. In addition, since sleep is disrupted in BDD, it would make sense to measure ECG during sleep (Migliorini et al., 2012). Pincus (2003) predicted that some form of sleep recording of ECG would be the most sufficient for this task (see also Saad et al., 2019).
An example from our publication (Llamocca et al., 2021) is illustrating how variables connected to sleep (sleep duration as the most significant one) are changing in relation to the state defined for that day (either as self-reported or pronounced by a clinician, since both are included in dataset). Figure 2 shows how real data from patient P14 differ in respect to the interpolated data. We can conclude that P14 usually sleeps about 8 h in euthymic state, but this time-period varies when P14 is about to enter crisis or is already suffering from one. Figure 2 shows the periods for states in which patient P14 was, as well as the evolution of self-report variable D sleep duration (duration of their sleep).
The second part of our proposition for further improvement of approach to prediction would be in connection to ML models. We were using various forms of supervised learning to learn from the data. The authors who are dealing with more theoretical approach to computational psychiatry (Whelan and Garavan, 2014) are advocating avoidance of 'unwarranted optimism' by collecting more data and lowering the number of variables per person (Kohavi, 1995;Tibshirani, 1996;Ng, 1997). Relying on Bayesian approaches is recommendable. Special care should be given to dimensionality problem, which our group addressed entirely (Llamocca et al., 2021). In addition, support vector machines (SVM) might be one of the most popular models, but other methods could be used, such as embedded regularization (Whelan et al., 2013). Knowing the heterogeneity problem, we suggest introducing some of unsupervised learning methods, such as subgroup discovery (which is a binary classifier and works on labeled data) and association rule discovery (which is unsupervised ML Model), or predictive and descriptive clustering (distance-based models) (Flach, 2012). The problem with clusters can be 2-fold: either you have a trivial solution (which corresponds to overfitting in linear models, let us say clustering overfitting) that can be resolved if we penalize the large K, or if we fix the number of clusters K in advance; the problem cannot be solved for large datasets (but a typical dataset is not large). Soft clustering generalizes the notion of partition, in the same way that a probability estimator generalizes a classifier (Flach, 2012). With the abovementioned suggestion, the algorithm can learn from the (properly characterized) data. We can conclude what the subgroups are and what the relations between present instances are, so we can try to interpret them in the light of information theory approach to physiological processes. Busk et al. (2020) used a similar manner of collecting the data, with different items in the questionnaire. They tested the feasibility of forecasting daily subjective mood scores based on daily self-assessment from 84 patients with bipolar disorders via smartphone in a randomized clinical trial. Combined historic data and currently collected data improved forecasting and used Hierarchical Bayesian approach, a multi-task learning method. They used data from different subjects as additional cases to learn. Ordinal regression (or ordinal classification) is a method of predicting a discrete variable that has a relative ordering of the possible outcomes. First, they started with 1-day forecast with several scenarios (two time-series crossvalidation experiments) and applied best model to evaluate 7-day forecast. When increasing the forecast horizon, forecast errors also increased and the forecast regression shifted toward the mean of data distribution; the best model used a 4-day history of self-assessment. Interestingly, authors used similar organization of the dataset that is usually used for entropy-based analysis of physiological data, discussed above; maybe the historicity of the data would be the key for successful forecasting. Besides, some shift in ML models used for much needed realistic forecasting includes much preferred unsupervised learning or functional data analysis (Wang et al., 2016). Table 1 is offering some recommended techniques with our justification.
We hope that an improved research methodology, based on abovementioned comparison and analysis, would eventually lead to a much better theragnostic and improve the quality of life of patients.

AUTHOR CONTRIBUTIONS
VL developed the idea for research. PL, VL, and MČ performed the research, wrote the manuscript, and reviewed the manuscript. PL and VL collected and analyzed the data. PL generated figures. All authors contributed to the article and approved the submitted version.

FUNDING
This work was partially supported by the grant PID2020-113192GB-I00 (Mathematical Visualization: Foundations, Algorithms, and Applications) from the Spanish MICINN.