Application of machine learning models in predicting insomnia severity: an integrative approach with constitution of traditional Chinese medicine

Objective This study sought to explore the utility of machine learning models in predicting insomnia severity based on Traditional Chinese Medicine (TCM) constitution classifications, with an aim to discuss the potential applications of such models in the treatment and prevention of insomnia. Methods We analyzed a dataset of 165 insomnia patients from the Shanghai Minhang District Integrated Traditional Chinese and Western Medicine Hospital. TCM constitution was assessed using a standardized Constitution in Chinese Medicine (CCM) scale. Sleep quality, or insomnia severity, was evaluated using the Spiegel Sleep Questionnaire (SSQ). Machine learning models, including Random Forest Classifier (RFC), Support Vector Classifier (SVC), and K-Nearest Neighbors (KNN), were utilized. These models were optimized using Grid Search algorithm and were trained and tested on stratified patient data, with the TCM constitution classifications serving as primary predictors. Results The RFC outperformed others, achieving a weighted average accuracy, precision, recall, and F1-score of 0.91, 0.94, 0.92, and 0.92 respectively, it also effectively classified the severity of insomnia with high area under receiver operating characteristic curve (AUC-ROC) values. Feature importance analysis demonstrated the Damp-heat constitution as the most influential predictor, followed by Yang-deficiency, Qi-depression, Qi-deficiency, and Blood-stasis constitutions. Conclusion The results demonstrate the potent utility of machine learning, specifically RFC, coupled with TCM constitution classifications in predicting insomnia severity. Notably, the constitution classifications such as Damp-heat and Yang-deficiency emerged as crucial determinants, emphasizing its potential in guiding targeted insomnia treatments. This approach enables the development of more personalized and efficient interventions, thereby enhancing patient outcomes.


Introduction
Insomnia, a common sleep disorder, disrupts the ability to fall asleep, maintain sleep, or achieve restorative sleep, consequently interfering with daytime functioning (1).It is a significant public health issue, affecting approximately 10-30% of the global population and causing further health complications (2).This concern is amplified in elderly and psychiatric demographics, where its prevalence is markedly higher (3).Given the complex etiology of insomnia, which often encompasses an intricate interplay of biological, psychological, and environmental factors, crafting effective, individualized treatment strategies remains a considerable challenge for both primary care providers and sleep medicine specialists (4)(5)(6).
Traditional Chinese Medicine (TCM), with its unique health and disease perspective that emphasizes harmony among body, mind, and environment, has been suggested as a complementary approach to the conventional biomedical model for managing insomnia (7,8).Within this framework, individual inherent traits, or "constitution, " are central.These constitutions encompass physical characteristics, susceptibility to diseases, and reactions to environmental changes and are assessed using the standardized Constitution in Chinese Medicine (CCM) scale.This scale, with validated reliability, is used in various health contexts, forming a basis for understanding individual differences in health and disease from a TCM perspective (9)(10)(11)(12).Building on this foundational concept of constitution in TCM, insomnia is interpreted as more than just a symptom; it is a manifestation of the imbalance within the body's fundamental elements such as Yin and Yang.This perspective aligns with the holistic nature of TCM, which perceives sleep disturbances as interconnected with other physiological and psychological imbalances.Historically, personalized therapeutic strategies based on individual constitution and presenting symptoms were formulated by TCM practitioners.These strategies aimed at restoring balance and harmony within the body, addressing the root causes of insomnia rather than merely alleviating the symptoms.Therefore, the ancient practices and holistic approach of TCM provide a comprehensive viewpoint to explore the underlying intricacies of insomnia and its relationship with various constitutions (13,14).
Advancing from this holistic viewpoint and the significant role of constitution in TCM, recent empirical studies suggest the potential applicability of the CCM in managing insomnia.Specific TCM constitution types, such as Yin-deficiency and Qi-deficiency, have been recognized as more prevalent in individuals grappling with insomnia (15).Further, a study by He et al. (16) reported that acupuncture treatment based on an individual's CCM score led to notable improvements in sleep quality and a reduction in insomnia symptoms.In the face of the widespread prevalence and complex nature of insomnia, these findings indicate a promising avenue for incorporating the TCM constitution-based approach into a comprehensive, individualized management plan for this sleep disorder (14).
This study underscores the significant influence of TCM constitution on the prevention and prognosis of insomnia, suggesting that the CCM scale score could act as a crucial predictive tool for determining the severity of insomnia.The uniquely powerful role of an individual's constitution in TCM highlights its impact on overall physical and mental health, with a marked effect on susceptibility and severity of illnesses, including insomnia (17).By exploring the predictive capabilities of the CCM scale in relation to insomnia severity, we are not only poised to forge innovative links between TCM and modern sleep medicine, but also lay the foundation for the development of more comprehensive, personalized therapeutic strategies that emphasize the remarkable contributions of TCM constitution in managing and predicting insomnia, thus shaping its treatment outcomes.
Simultaneously, the advent of machine learning algorithms in healthcare presents a transformative opportunity to unearth complex relationships between variables (18).Particularly, methods like Random Forest (19), Support Vector Machine (20), and K-nearest neighbors (21) have been extensively utilized in the field of predictive medicine and have shown robust results in a variety of clinical prediction tasks (22,23).Incorporating these algorithms to analyze the CCM scale scores might offer novel insights into the multifaceted association between TCM constitution types and insomnia severity.
While the ancient wisdom of TCM has a longstanding history in addressing insomnia, encapsulating holistic and individualized approaches, there remains a striking scarcity in modern studies with rigorous data support directly linking TCM constitution types with insomnia severity.This discernible gap in evidence-based research underscores the need for more in-depth investigations that explore the predictive role of TCM constitution in insomnia, integrating machine learning methodologies.Such exploration can deepen our understanding of insomnia through a TCM lens and contribute valuable insights towards the pathogenesis, prediction, and treatment of insomnia (24).
To this end, our study aims to examine the correlation between the CCM scale score and the severity of insomnia, utilizing machine learning algorithms for prediction.We anticipate that our findings will establish a theoretical and empirical groundwork for the application of TCM constitution in insomnia prediction and management.This, in turn, would foster a more tailored and efficacious approach to insomnia treatment, potentially improving patient outcomes and quality of life.

Data sources
This investigation was conducted with the explicit approval from the Ethics Committee of the Shanghai Minhang District Integrated Traditional Chinese and Western Medicine Hospital (Ethics Reference No. 2021-007) and each included patient provided a written informed consent.The data utilized in this study were meticulously collected from 165 patients diagnosed with insomnia, receiving their treatment in the Department of Preventive Medicine of the aforementioned hospital during the period of November 2021 to December 2022.The sample included 110 females with an average age of 46.92 ± 12.38 years, and 55 males with an average age of 46.05 ± 13.01 years.

Inclusion criteria
The inclusion criteria were established in accordance with the diagnostic criteria for primary insomnia in the 3rd Edition of the Chinese Classification and Diagnostic Criteria of Mental Disorders (CCMD-3), and the diagnostic criteria for insomnia in the "Diagnostic Criteria and Therapeutic Effect of TCM Diseases and Syndromes." The

Exclusion criteria
Subjects were excluded from the study if they: (1) did not meet the aforementioned inclusion criteria; (2) were pregnant or breastfeeding women; (3) had used antipsychotics or antidepressants within a week before their consultation; (4) had serious organ dysfunction or severe diseases in other systems; (5) had serious mental disorders; (6) were patients with malignant tumors; (7) had drug dependency.

Observational indicators 2.2.1. TCM constitution evaluation
TCM constitution was assessed using the CCM scale.This scale is composed of nine TCM constitution classifications, specifically Balanced constitution, Qi-deficiency, Yang-deficiency, Yin-deficiency, Phlegm-dampness, Damp-heat, Blood-stasis, Qi-depression, and Special constitution, each comprising 6-8 items.The characteristics of these constitution classifications are listed in Table 1.
Each item provides five potential responses, ranging from "none" to "always, " scored from 1 to 5, respectively.The original score is obtained by summing up the scores of each item.Subsequently, the converted score is calculated using the formula: (original scorenumber of items) * 100 / (number of items * 4).A converted score of ≥60, provided that the converted scores of the other eight biased constitutions are all <30, is deemed a definitive ("yes") constitution classification.A score < 40 is considered indicative ("basically yes") of a classification, while all other cases are determined as negative ("no") for the specific classification.
For this study, rather than establishing a constitution determination, we opted to employ the converted scores across all categories, given that these scores reflect the comprehensive constitutional characteristics of the patients.Hence, the converted scores of the nine classifications were input as independent variables (X) into the machine learning model for predicting insomnia severity.

Sleep quality evaluation
Sleep quality in this study was evaluated using the Spiegel Sleep Questionnaire (SSQ), an established and validated (Cronbach's α coefficient of SSQ is 0.868) self-reported instrument routinely employed in clinical research to assess sleep-wake patterns.This tool boasts comprehensive coverage of the sleep-wake cycle, contributing to its sensitivity and reliability in tracking sleep pattern changes over time, as well as evaluating the efficacy of sleep-related interventions.It remains particularly invaluable in the study and management of sleep disorders, including insomnia (25, 26).
The SSQ explores six dimensions of sleep and wakefulness: initial and terminal insomnia, perceived quality of sleep, refreshment upon awakening, daytime alertness, and total sleep time.Each dimension is assessed on a 5-point Likert scale, with higher scores corresponding to worsened sleep disturbances.Thus, a lower cumulative score signifies improved sleep quality and decreased daytime sleepiness.For the purposes of this investigation, insomnia severity was delineated into three classes, according to the SSQ scores of the included cases: mild (score ≥ 12), moderate (score ≥ 18), and severe (score ≥ 24).The outcomes of these three severity classifications will serve as the dependent variable (Y) in the training of our machine learning model.

Data analysis
The predictive power of TCM constitution classifications on the severity of insomnia was explored using three machine learning models: Random Forest Classifier (RFC), Support Vector Classifier (SVC), and K-Nearest Neighbors Classifier (KNN).The data set was partitioned into a training set (80% of the total data) and a test set (20% of the total data), enabling model training and subsequent performance evaluation.

Data preprocessing
Data preprocessing involved normalization to maintain uniformity in feature scales, imputation of missing values through domain-specific insights and statistical methods, and the handling of outliers to reduce skewness and model bias, ensuring the reliability and validity of the dataset.

Feature selection
The feature selection was driven by the TCM constitution assessment results acquired through the CCM scale.The converted scores derived from the nine TCM constitution classifications served as pivotal features in our models.

Model optimization and hyperparameter tuning
In our relentless pursuit of model optimization, a Grid Search algorithm was meticulously implemented to fine-tune the hyperparameters of each model, assessing a range of parameter values to identify the optimal combination enhancing model performance.For the RFC, the parameters under consideration included 'n_estimators' (the number of trees in the forest), which was varied among [10,50,100,200, 500], 'max_depth' (the maximum depth of the tree), taking values from [None, 10, 20, 30, 50], and 'min_samples_split' (the minimum number of samples required to split an internal node), taking values from (2,5,10).

Model performance evaluation
Model performance was evaluated using several key indicators, including accuracy, precision, recall, f1-score and area under the receiver operating characteristic curve (AUC-ROC).
Accuracy (A) is the proportion of true results (both true positives and true negatives) among the total number of cases examined, and calculated as: where TP is the number of true positives, FP the number of false positives, TN is the number of true negatives, and FN is the number of false negatives.
Precision (P) quantifies the number of positive class predictions that actually belong to the positive class, and is defined as: Recall (R) quantifies the ability of a model to find all the relevant cases within a dataset, and is defined as: The F1-score (F1) is the harmonic mean of precision and recall, calculated as: AUC-ROC is a performance measurement for the classification problems at various threshold settings, representing the degree or measure of separability and indicating how well the model can distinguish between classes.In this work, AUC-ROC was used in evaluating the predictive accuracy of machine learning models in distinguishing between different severity levels of insomnia.Each model's performance was evaluated on the test set to ensure the evaluation is unbiased and reflects the model's ability to generalize to unseen data.
In addition to building the predictive models, correlation analyses were performed to assess the relevance of the nine TCM constitution classifications to insomnia severity.Those constitutions demonstrating a higher correlation were considered as key features, enriching the predictive models and enhancing their practical applicability in insomnia management.The whole research workflow is shown in Figure 1.

Results
The Basic information, TCM constitution converted scores and SSQ scores of insomnia patients were listed in Table 2. Three machine learning models -RFC, SVC, and KNN -were utilized to discern patterns in TCM constitution converted scores, with the aim to predict insomnia severity.The comparative performance of various classifiers is presented in Figure 2.Among all the models evaluated in this study, the RFC outperformed others, yielding an accuracy score of 0.916.The superior performance of RFC was achieved with hyperparameters set at a maximum depth of None, a minimum sample split of 5, and 100 estimators.This accentuates the promising utility of RFC for classifying the severity of insomnia grounded on Traditional Chinese Medicine (TCM) constitution scores.In contrast, the SVC and KNN classifiers, despite being fine-tuned based on their optimal parameters, rendered relatively lower accuracies of 0.75 and 0.66, respectively.
Upon further evaluation, the RFC displayed superior weighted averages across precision, recall, and F1-score, suggesting its robust performance across all categories of insomnia severity.The SVC and KNN classifiers also demonstrated commendable performance, as indicated by their weighted averages; however, they were marginally outperformed by the RFC.More specifically, RFC presented a weighted average precision of 0.94, a recall of 0.92, and a F1-score of 0.92, underpinning its high predictability and low false positive rate.Furthermore, the predictive performance of the models was also evaluated using AUC-ROC.For differentiating mild insomnia from other classes, all three models demonstrated exemplary performance, with RFC and SVC achieving a perfect AUC of 1, and KNN closely following with an AUC of 0.95 (Figure 3A).When distinguishing moderate insomnia from the other severities, RFC still performed significantly well with an AUC of 0.93, whereas SVC and KNN yielded lower AUCs of 0.67 and 0.7, respectively (Figure 3B).In the classification of severe insomnia against the other classes, RFC achieved an AUC of 1, showcasing its superior predictive capability, while SVC and KNN showed commendable performance with AUCs of 0.89 and 0.86, respectively (Figure 3C).This denotes that, in this context, RFC tends to provide a more accurate and reliable classification prediction of insomnia severity.
The RFC model's insightful exploration delineated a hierarchy of TCM constitution classifications based on their predictive potency and correlation with insomnia severity.Emphasizing this, the Dampheat constitution manifested the highest feature importance, registering at 0.1514.This was closely followed by Yang-deficiency, Qi-depression, Qi-deficiency, and Blood-stasis constitutions, with importance values of 0.1374, 0.1346, 0.1249, and 0.1082 respectively, all surpassing the benchmark of 0.10.The substantial feature importance of these TCM constitution classifications, therefore, underscores their significant predictive role and potent correlation with insomnia severity (Figure 4).

Discussion
The use of machine learning models in the field of medical research and particularly in the realm of TCM represents a significant advance in contemporary medicine (27).In the current study, we used three established machine learning models: Random Forest, Support Vector Machine, and K-Nearest Neighbors, to predict the severity of insomnia based on TCM constitution scores.Among these, RFC emerged as the most predictive model, demonstrating superior accuracy compared to SVC and KNN.Random forest, a powerful ensemble machine learning model, has been widely used in various medical fields, including in the diagnosis and prognosis of diseases like cancer (28), cardiovascular disease (29), and diabetes (30).Its high performance can be attributed to its ability to handle high-dimensional data, capture non-linear relationships, and accommodate potential interactions among features (31).These attributes are particularly important when dealing with complex medical data, where a multitude of factors interplay to determine health outcomes (32).Additionally, random forest has the added advantage of providing feature importance, offering insight into which predictors most significantly impact the predicted outcome (33).Thus, random forest presents a potent tool for predicting insomnia severity in the context of TCM constitutions.
The importance of TCM constitution classifications in predicting insomnia severity cannot be understated.Our results showed that certain TCM constitution classifications such as 'Qi-deficiency' , 'Yangdeficiency' , 'Damp-heat' , 'Blood-stasis' , and 'Qi-depression' exhibited significant feature importance, each greater than 0.10.This suggests that these specific TCM classifications might play a key role in contributing to the severity of insomnia.
Interestingly, modern medical research supports this observation.For instance, a study published in 2015 established a link between Qi-deficiency, which relates to energy levels and fatigue in TCM, and increased severity of insomnia (34).Similarly, 'Yang-deficiency' which in TCM is associated with cold sensations and poor circulation, has been shown to affect sleep quality, particularly in the elderly population (35).'Damp-heat' , another TCM constitution classification, refers to a state of imbalance in the body often associated with inflammation (36).This imbalance has been linked to sleep disturbances in a study published in 2021 (37).Blood-stasis, representing a stagnation or slowing down of circulation in TCM, has been associated with sleep apnea in a recent study (38), which could lead to disrupted sleep and increased insomnia severity.Lastly, 'Qi-depression' , a state of emotional stagnation in TCM, has been associated with psychiatric conditions such as depression and anxiety, which are well-known contributors to insomnia (39,40).
The application of this prediction model, particularly in clinical treatment and prevention of insomnia, could be wide-reaching.As we have demonstrated, the model can effectively predict insomnia severity based on TCM constitution classifications.These insights could guide clinicians in tailoring individual treatment strategies for patients suffering from insomnia, taking into consideration the identified important TCM constitution classifications.Moreover, it could assist in patient stratification, helping healthcare professionals to identify individuals at higher risk of severe insomnia and therefore needing more immediate or intensive interventions (41).For example, for patients showing signs of 'Qi-deficiency' and 'Yang-deficiency' , treatment strategies could focus on addressing the corresponding imbalances, using modalities such as herbal remedies, acupuncture, or lifestyle adjustments known to help correct these specific deficiencies.Similarly, for those showing 'Damp-heat' , 'Blood-stasis' , and 'Qi-depression' classifications, targeted therapies could be implemented to address these conditions, which in turn, could ameliorate the severity of insomnia (42).
Importantly, the use of such a model can also guide preventative measures (43,44).By identifying the at-risk population, preventive interventions can be implemented early, before the onset of severe insomnia.Such proactive management could potentially reduce the burden of insomnia on both the individual and healthcare system.
Furthermore, while our model has been applied to insomnia in the context of TCM, the same methodology can be applied to other health conditions where TCM constitution classifications play a role.This opens the door to a range of potential applications, further enhancing the utility of TCM constitution classifications in modern healthcare.
Nevertheless, further validation of the model in different populations and clinical settings is necessary to ascertain its generalizability.As we continue to integrate traditional and modern medical knowledge, models such as the one presented in this study will be instrumental in enabling a more nuanced understanding of health and disease, ultimately benefiting patient care.

Conclusion
In conclusion, this study illuminates the potential of employing machine learning models, particularly the Random Forest, alongside TCM constitution classifications to enhance the management of insomnia.The substantial predictive capacity of TCM constitution types such as Damp-heat and Yang-deficiency suggests a pathway towards more personalized, and therefore potentially more effective, treatment approaches.These predictive models could serve as valuable tools in both the clinical decision-making process and the formulation of targeted preventative measures.While the results are encouraging, further validation in diverse patient populations remains essential to ensure their robust applicability.

FIGURE 1
FIGURE 1Overview of the data acquisition, modeling, and model performance evaluation.

FIGURE 2 Performance
FIGURE 2 Performance Comparison of Different Machine Learning Models-Random Forest Classifier (RFC), Support Vector Classifier (SVC), and K-Nearest Neighbors (KNN)-in Predicting Insomnia Severity Levels based on the Constitution in Chinese Medicine (CCM) Scale Score.Panel (A-D) showed the results of accuracy, precision, recall and F1-score, respectively.

FIGURE 3
FIGURE 3 Receiver operating characteristic curves (ROC) for three machine learning models.Panel (A-C) represented the insomnia severity classification of class 1 (Mild Insomnia), class 2 (Moderate Insomnia), and Class 3 (Severe Insomnia) against other classes, with AUC values indicating the models' discriminatory power, respectively.
Sleep disorder is the primary symptom, with other symptoms secondary to insomnia.The main symptoms include difficulty in falling asleep, shallow sleep, excessive dreaming, early waking, and difficulty falling back asleep after waking.Secondary symptoms include palpitations, forgetfulness, dizziness, fatigue, a sallow complexion, among others.All of the main symptoms and at least one of the secondary symptoms should be present; (3) The sleep disorder occurs at least three times a week and lasts for more than a month; (4) Insomnia causes significant distress or some symptoms of mental disorder, leads to decreased efficiency in activities or hampers social functioning; (5) Insomnia is not due to any physical disease or mental disorder.

TABLE 1
The characteristics of nine TCM constitution classifications.

TABLE 2
Basic information and evaluation results of TCM constitution and sleep quality of insomnia patients.