Dissecting clinical and biological heterogeneity in clinical states of bipolar disorder: a 10-year retrospective study from China

Objectives To dissect clinical and biological heterogeneity in clinical states of bipolar disorder (BD), and investigate if neuropsychological symptomatology, comorbidity, vital signs, and blood laboratory indicators are predictors of distinct BD states. Methods A retrospective BD cohort was established with data extracted from a Chinese hospital’s electronic medical records (EMR) between 2009 and 2018. Subjects were inpatients with a main discharge diagnosis of BD and were assessed for clinical state at hospitalization. We categorized all subjects into manic state, depressive state, and mixed state. Four machine learning classifiers were utilized to classify the subjects. A Shapley additive explanations (SHAP) algorithm was applied to the classifiers to aid in quantifying and visualizing the contributions of each feature that drive patient-specific classifications. Results A sample of 3,085 records was included (38.54% as manic, 56.69% as depressive, and 4.77% as mixed state). Mixed state showed more severe suicidal ideation and psychomotor abnormalities, while depressive state showed more common anxiety, sleep, and somatic-related symptoms and more comorbid conditions. Higher levels of body temperature, pulse, and systolic and diastolic blood pressures were present during manic episodes. Xgboost achieved the best AUC of 88.54% in manic/depressive states classification; Logistic regression and Random forest achieved the best AUCs of 75.5 and 75% in manic/mixed states and depressive/mixed states classifications, respectively. Myocardial enzymes and the non-enzymatic antioxidant uric acid and bilirubin contributed significantly to distinguish BD clinical states. Conclusion The observed novel biological associations with BD clinical states confirm that biological heterogeneity contributes to clinical heterogeneity of BD.


Introduction
Bipolar disorder (BD) refers to a severe mood disorder affecting more than 1% of the global population and is associated with a high socio-economic burden (1).BD is characterized by alternating episodes of mania/hypomania (elevation of mood and increased energy and activity) and depression (lowering of mood and decreased energy and activity) (1).A subset of BD patients may currently exhibit either a mixture or a rapid alteration of manic and depressive symptoms, which is defined as mixed state (2).Mixed states require individuals to meet both the diagnostic criteria for depression and mania.The Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5) has supplanted mixed states with the mixed features specifier, which requires the presence of at least 3 depressive (manic or hypomanic) non-overlapping symptoms during a hypomanic or manic (major depressive) episode (3).Akiskal et al. (4) argue that depressive symptomatology also is common among those diagnosed with BD manic episode; however, the symptoms are insufficient to meet the criteria for BD mixed episodes.BD is a complex disease with heterogeneous clinical manifestations, and traditional categorical paradigm of BD spectrum (manic, depressive, and mixed states) typically has different disease courses, treatment responses, and prognoses (5).
The diagnosis of BD is made by a comprehensive clinical assessment, however, there is no biomarker (such as genetic testing) that can inform the diagnosis, prognosis, or treatment outcome of BD (3,6).In clinical practice, major challenges persist recognizing BD and mixed state is prone to misdiagnosis as other clinical states, due to widely varying rates of individual symptoms and overlapping symptoms (7).The clinical implications of detecting mixed states are that antidepressants should not be prescribed to adults with mixed states of BD (8)(9)(10).Difficulties in the accurate differential diagnosis impede the effective treatment of patients, which may worsen the prognosis (11) and increase the risk of switching between different states (12,13).There is also substantial evidence of the negative effects of cumulative episodes on cognitive function, somatic and psychiatric comorbidity (14), and high suicide rate (15).Therefore, in order to improve the early diagnosis of BD, there is an urgent need to better dissect clinical and biological heterogeneity in different BD clinical states.While much attention has been given to differential diagnosis of BD and other psychiatric disorders [such as major depressive disorder (MDD), attention-deficit hyperactivity disorder (ADHD), borderline personality disorder, substance use disorders, and schizophrenia (SCZ)] (3), few works have started to investigate whether shared biological risk explaining some of the overlapping psychopathological symptoms in different BD clinical states exist (16).Part of the reason is the lack of objective diagnostic markers and targeted criteria to comprehensively assess different BD clinical states, and systematically obtained data on biological characteristics and the spectrum and severity of symptomatology are limited (17).
Previous analyses of BD clinical states focused on comparing two samples (manic and mixed states) across demographic differences, symptom presentations, treatment patterns (18), and personality features (16).Studies on sleep and circadian heart rate rhythms showed to be sensitive to BD clinical states (19).Bipolar mood states (euthymic state, depressive state, and mixed state) can also be related to electrodermal tonic activity (20).Singh et al. ( 21) assessed the spectrum and severity of bipolar symptoms that differentiated BD mixed state from BD-depression or BD-mania/hypomania, employing the Bipolar Inventory of Symptoms Scale (BISS).In addition, laboratory non-enzymatic antioxidants (including uric acid, bilirubin, and albumin) were reported that they can be used as markers to reflect the level of oxidative stress, and their serum concentrations may be associated with the onset of bipolar disorder (22).A recent metaanalysis reported that uric acid levels were higher in BD manic episodes vs. BD depressive episodes (23).Some studies have found that a panel of urine metabolites (24,25) and other blood biomarkers (26) could prove a promising path for the search of biomarkers in BD, but biomarkers useful in distinguishing different BD clinical states have not yet been established.Although the above studies provide important insights into several aspects of BD, whether it is mixed, manic, or depressive, there is little focus on comparing the differences in clinical, physiological, and biological characteristics of these three diagnoses simultaneously.It remains unclear whether the corresponding link between clinical states and overlapping heterogeneity could be demonstrated by neuropsychological symptomatology, vital signs, comorbidity, and blood laboratory indicators.To our knowledge, no study has comprehensively compared the clinical status of BD and attempted to develop objective evaluation measures to establish a data-driven diagnostic decision support model.The purpose of this study is to help fill this gap.
Current studies in BD have been hampered by small sample sizes, but electronic medical records (EMR) provide an exciting opportunity for large-scale clinical studies at low cost and with high classification accuracy (27).With the increasing availability of EMR, high fidelity heterogeneous data on patient information has been captured during hospitalization care, yet the utilization of this data to improve patient diagnosis and quality of care remains poor.Particularly, unstructured data such as narrative notes in EMR often record substantial clinical details about patients' current condition and symptoms, which may be easily accessible and have not previously been used in machine learning (ML) models.Most commonly, clinicians organize information gathering around their patients' self-report of current or most recent mood state, and then present it in the chief complaint.Therefore, natural language processing (NLP) technology could be a promising method to parse narrative notes in EMR data, focusing on identifying and extracting patients' symptoms and current mood states.Recent works have shown that ML models are well suited to differentiate BD from MDD using MRI data and blood biomarker data (28)(29)(30).No application of interpretable ML methods and NLP techniques to analyze EMR data for the identification and classification of different clinical states of BD has been found.To address this unmet need, we leverage recent advances in medical informatics and present new ML methods to aid in dissecting clinical and biological heterogeneity in clinical states of BD and in providing interpretable key markers in ML models that differentiate patient-specific states.
The aims of the present study were (1) ascertain whether BD patients showed different patterns of vital signs, comorbidities, bipolar symptoms, and blood laboratory indicators, depending on their clinical state, (2) determine which markers might be helpful to differentiate BD clinical states, (3) develop ML models to accurately distinguish BD clinical states before treatment decision making, and (4) quantify and visualize the contribution of each marker in the model that drive patient-specific classifications.
2 Subjects and methods

Data source
All records were extracted from the EMR system of West China Hospital (WCH), Sichuan University.WCH is one of the largest single-site hospitals in the world and a leading medical center of West China, treating complicated and severe cases.The hospital has 4,300 beds and more than 10,000 medical staff.The psychiatric specialty of WCH provides medical services for a large number of patients with mental illness, with more than 300,000 outpatient visits and more than 6,000 discharged patients per year.As one of the most renowned medical centers in China, WCH's clinical practice represents the current situation of patients with mental illness in China.In 2009, an EMR system integrated with the Health Information System (HIS) and the Laboratory Information System (LIS) was adopted in all departments throughout the hospital, which was set as the starting time of our data extraction.

Study subjects
This study included all inpatients with diagnosis as BD from all departments of WCH from January 2009 to December 2018 (10 years).Six thousand fifty-eight records of inpatients with discharged diagnosis as BD were extracted from the EMR system, identified by the International Classification of Diseases, 10th revision (ICD-10) codes.We used the presence of an F31 ICD-10 code in either the main or supplementary position to signify a BD-related admission.Four thousand seven hundred sixty-one records with a principal diagnosis of BD were kept.Based on this cohort, records were excluded if they were: (1) of patients with non-Chinese nationalities; (2) of patients with subtype diagnoses not specified; (3) of patients' lab test information missing; (4) duplicated storage (records with the same inpatient code and case code).The data were checked for missing values, and records with any missing value, were excluded from the analysis.The remaining 1,189 records with BD manic clinical state, 1,749 records with BD depressive clinical state and 147 records with BD mixed clinical state were included in the analyses (Figure 1), yielding a total study cohort of 3,085 records.

Definition of outcome
Compared with clinical diagnostic interviews, EMR-based diagnostic data can be used to identify patients with bipolar disorder and control subjects with high specificity and predictive value (27).
The outcome measure in this study was defined as the main discharge diagnosis of each subject, identified by ICD-10 codes.The discharge diagnosis is the gold standard for the characterization of the diagnostic categories, which is directly evaluated by trained psychiatrists using structured or semi-structured diagnostic interviews at admission and repeatedly confirmed (or even revised) during the hospitalization.

Features
The time course of this study spanned from each patient's admission time to the discharge time.We extracted various categories of features from the original medical records, including sociodemographic information (age at admission, gender, marital status, job, ethnicity, source of payment, province of hometown), vital signs based on basic body check at admission (pulse, breathing, nutrition, temperature, systolic and diastolic pressure), prior illness history (allergy, blood transfusion, drug use, surgery), diagnoses, whether the patient had a diagnosis of a medical comorbidity (other mental disorders, endocrine diseases, nervous diseases, digestive diseases, circulatory diseases, respiratory diseases, and cancer), the number of comorbidity in each disease system, laboratory tests (routine blood, urine, stool and biochemical examination), and other information.Each patient might have more than once laboratory tests during the hospitalization.We extracted the first recorded laboratory test results during the inpatient stay of all subjects as the laboratory diagnostic criteria.For each laboratory test type, only the most common result types (listed by result name like platelet count, red blood cell count, absolute of lymphocyte, etc.) were used in the models.For each laboratory result type, the numerical value, or the categorical level (in the normal range or higher or lower) were included as features.No laboratory data after the first recorded date were included in analyses.
Unstructured data (chief complaint) was processed by removing all digits and punctuation.The text data was then split into words using the jieba package in R. To filter out meaningless words, we added a list of stop words and medical dictionaries.This helped to remove noise and irrelevant terms.We also prevented certain medical nouns from being split apart [details of which have been presented in our previous work (31)].To identify the most informative words, we calculated the term frequency scores.These scores helped us to identify the words that were most representative of patients' main symptoms and current mood states.We referred to these words as features, including "lowering of mood, " "elevation of mood, " "mood instability, " "bad sleep, " "provoke, " "worry, " "talkativeness, " "suicide ideation, " "pain, " "appetite disturbances, " "auditory hallucination, " and "relapse and worsen of symptoms." The value of each symptom feature depended on whether the chief complaint of a patient included the above key words representing their symptoms and mood states.Specifically, a value of 1 was assigned if the patient's chief complaint included a specific word related to their symptoms or mood states, and a value of 0 was assigned if it did not include that word.This approach helped to encode the patients' chief complaints into numerical features for further analysis and modeling.
Overall, 198 features were recruited into the original data pool with no missing values (details in the Supplementary Information).Three groups (manic, mixed, and depressive BD) were compared regarding all included features, by using the chi-square test for categorical features and t-test for continuous features in the

Machine learning classification
This study was undertaken to determine the potential for multisystem composite features as predictors to discriminate between manic, depressive, and mixed clinical states of bipolar disorder patients in a data-driven approach.Only the significant features at early admission (sociodemographics, vital signs, prior illness history, comorbidities, chief complaints, and laboratory tests) were used in the ML model.All analyses were implemented using RStudio (Version 3.6.1 for Windows).In our study, four ML classifiers were selected:

Random forests
Random forests (RF) algorithm works by modeling several decision trees which learn and make predictions independently, and outputs a combined single prediction that is same or better than the output made by the modeled decision trees (32).RF can be used for continuous and/or categorical input variables and allow class weighting to adjust for unequal sampling schemes.RF does not overfit the data, so it can be used for problems in which the number of input variables is much larger than the number of observations.It also provides measures of feature importance and proximities that can be used to interpret the data.

Support vector machine
A support vector machine (SVM) is a supervised learning algorithm commonly used for classification and regression problems, especially in medical applications.SVM belongs to a category of ML algorithms called kernel methods, which involve transforming features using a kernel function (33).Kernel functions map the data to a different, often higher-dimensional space, with the expectation that the classes will be easier to separate after this transformation.This transformation can potentially simplify complex non-linear decision boundaries into linear ones in the higher-dimensional feature space.

XGBoost (eXtreme Gradient Boosting)
XGBoost (XGB) is an optimized distributed gradient boosting library known for its high efficiency, flexibility, and portability (32).XGBoost's parallel tree boosting algorithm effectively solves a wide range of data science problems with speed and precision.It was chosen as the preferred method for building diagnostic algorithms due to its ability to handle missing values, detect nonlinear relationships and interactions between variables, remain robust in the presence of correlated features, and provide interpretable results.

Logistic regression
Logistic regression (LR) can be either binomial or multinomial.Like other forms of regression analysis, LR utilizes one or more predictor variables, which can be continuous or categorical data.The expected value of the response variable is adjusted to fit the predictors, Flowchart of the study subjects.and the regression function is a sigmoid function that transforms a real number into a value between 0 and 1. LR employs the maximum likelihood estimation method to estimate the model coefficients (34).We used the holdout method, and randomly divided the clinical cohort into 80% training and 20% testing data, maintaining the proportion of patients in each clinical state.The training dataset was used to train the four algorithms and optimize their parameters for better classifier construction.The classifiers were then calibrated using the testing dataset, which was not utilized for model selection or parameter tuning.Fine-tuning involved adjusting the parameter combinations within the trainControl function.The parameters resulting in the best classification performance for each algorithm were chosen using 10-fold cross-validation on the training data.The learned parameters were then used to construct a model for the entire training set and to make predictions on the testing data.All the classifiers utilized in this study were fine-tuned.Several classifiers have been selected to avoid bias toward the use of a particular classifier.
Due to the unbalanced sample size in each group (see Figure 1), particularly the BD mixed episode group, we sought to balance our dataset by randomly over-sampling the positive class (BD mixed episode) in the training set to achieve a balanced ratio between the positive and negative (BD manic episode or BD depressive episode) classes.The testing set distributions were not modified to reflect the reality of class imbalance during prediction, and the reported performance reflects those raw distributions.Seven classification runs were performed with each classifier: two multiclass classifications (non-resampling and resampling) between BD manic, depressive, and mixed episodes, three non-resampling binary classifications (manic/ depressive, manic/mixed, and depressive/mixed), and two resampling binary classifications (over-sampling manic/mixed and depressive/mixed).

Explainable classification
An important development towards enhancing the practical medical decision support provided by ML is the ability to offer straightforward explanations for predictions generated by arbitrarily intricate models, thus mitigating the typical trade-off between accuracy and interpretability.Explainable ML methods identify the specific characteristics that lead to the classification of each patient, which is crucial for determining targeted diagnostic markers and clinical prediction rules.In our work, we utilized a Shapley additive explanations (SHAP) algorithm in our classification model to obtain explanations of the features that drive patient-specific classifications.SHAP is a model-agnostic representation of feature importance where the impact of each feature on a particular prediction is characterized using Shapley values-a concept derived from cooperative game theory (35).A Shapley value signifies, given the current set of feature values, how much a single feature in the context of its interaction with other features contributes to the difference between the actual prediction and the mean prediction (36).The SHAP value for a feature reflects its compound effect when interacting with the other features.For comparison, we also showcased how specific features contribute in different classification scenarios, particularly, in three binary classifications of clinical states (manic/depressive, manic/mixed, and depressive/mixed) in BD patients.

Performance metrics of the ML models
To evaluate the ability to discriminate BD mixed episode, manic episode, and depressive episode, we computed the area under the receiver operating characteristics (ROC) curves (AUC).The AUC provides valuable insights into the relevant question of what proportion of true cases (sensitivity) and the proportion of false cases (specificity) the algorithm can correctly identify at different probability cutoffs.For multiclass classifications, we employed a confusion matrix as performance metric.All performance metrics were derived from the holdout testing dataset.The code is available upon request.

Sociodemographic features
We included 3,085 records extracted from the EMR database, with 1,189 records representing BD manic episode (38.54%), 147 records representing BD mixed episode (4.77%) and 1,749 records as BD depressive episode (56.69%).We compared these three groups across all clinical, biological, and sociodemographic features.Table 1 provides a summary of the basic sociodemographic characteristics of the full cohort.The mean (SD) age of the study population was 36.86 (17.05) years, and 63% of patients were female.Although the three groups did not differ significantly in the ethnicity and source of patient, they did differ on gender, age, age group, job, marital status, type of payment.Notably, patients with BD mixed episode were more likely to be female (70.1% vs. 66.3%BD depressive episode and 57.2% BD manic episode; p < 0.001), belong to the age group of 0-17 (16.3% vs. 9.0% BD depressive episode and 14.6% BD manic episode; p < 0.001), and be single (53.7% vs. 32.8%BD depressive episode and 50.7%BD manic episode; p < 0.001).

Symptoms
Patients experiencing depressive episodes were more likely to exhibit symptoms such as lowering of mood, mood instability, poor sleep, worry, pain and relapse of these symptoms.Patients in the manic episode group were more likely to have symptoms like elevation of mood, irritability, and talkativeness.The mixed episode group was characterized by symptoms like suicide ideation, appetite disturbances, and an exacerbation of the aforementioned symptoms.It is noteworthy that the proportion of symptoms in patients with mixed episodes falls somewhere between that of patients with manic and depressive episodes.Three groups only did not differ significantly in terms of auditory hallucination.This finding supports our initial hypothesis that unstructured data extracted from the chief complaint is a crucial tool for assessing patients' symptoms and current mood states in order to discriminate between the manic and depressive clinical states of BD patients.While mixed states are notoriously difficult to identify due to their composite symptoms of depression and mania, these data underscore the clinical importance of assessing patients with bipolar disorder for current symptoms of both poles of the illness regardless of their self-reported current mood state (Table 1).experiencing manic and depressive episodes, those in mixed episodes were more likely to exhibit lower mean temperatures, slower pulse rates, and lower systolic blood pressures.Patients in manic episodes, on the other hand, displayed the highest mean values for temperature, pulse, systolic, and diastolic blood pressures compared to the other two groups.In terms of prior history, patients in depressive episodes were more likely to have a history of allergies and surgery, and they also had slightly higher probabilities of a history of drug use and blood transfusion.Table 1 also outlines the cross-sectional rate of psychiatric and other comorbidity conditions in different clinical states of BD.Notably, the depression group had a significantly higher comorbidity rate than the other two groups, including higher rates of psychiatric comorbidity conditions, endocrine comorbidity conditions, digestive comorbidity conditions, circulatory comorbidity conditions, and slightly higher rates of nervous comorbidity conditions.Endocrine diseases were the most prevalent comorbid conditions in BD patients, followed by circulatory diseases.

Laboratory findings
Table 2 presents the results of laboratory examination dataset analysis in the study population.The dataset includes routine blood, biochemical, urine, and stool examination results obtained before hospitalization.
The results of the routine blood examination (Table 2) show that patients in manic episodes exhibit higher mean values for white blood cell count (WCC) and a higher proportion of patients with high abnormal WCC levels.In contrast, the mixed episode group displays a higher proportion of patients with low abnormal WCC levels.Additionally, patients in manic episodes have a higher proportion of patients with high abnormal monocyte (POM) levels, while those in depressive episodes show a higher proportion of patients with low abnormal POM levels.Patients in manic episodes also exhibit higher mean values for absolute of monocyte (AOM), red blood cell count (RBCC), average red blood cell (ARBC) HGB concentration, hemoglobin, platelet count (PC), percentage and absolute of neutrophil (PON, AON).Patients in mixed episodes display higher mean values for POM and percentage of lymphocyte (POL), while those in depressive episodes show higher mean values for percentage of basophil (POB).
The routine biochemical examination results (Table 2) indicate that patients in manic episodes exhibit higher mean values for aspartate aminotransferase (ASA), creatine kinase (CK), lactate dehydrogenase (LD), direct bilirubin (DB), total protein (TP), albumin, creatinine, glucose, alkaline phosphatase (AP), uric acid (UA), hydroxybutyrate dehydrogenase (HD), calcium and anion gap (AG).Statistical comparison between the three groups in the test results of routine examinations of urine and stool showed no significant difference in all indicators.
3.2 ML models to distinguish between BD manic, depressive, and mixed clinical states

Classifiers
When distinguishing between manic, depressive, and mixed clinical states in BD patients upon admission, the performance of four machine learning algorithms (LR, SVM, RF, Xgboost) were evaluated for both in the multiclass classification (Table 3) and binary classification (Table 4).
In the multiclass classification, the best performance in the holdout testing dataset was achieved using the non-resampling training dataset and the Xgboost classifier.This model achieved an overall accuracy of 79.90% and an AUC of 79.5%.The balanced training dataset had the same number of subjects as the original unbalanced dataset, but with a ratio of 1:1:1 for the three clinical states.Although the accuracy of the re-sampling model was slightly reduced, mixed clinical states could be detected more effectively in the LR model (with a 44.8% detecting rate and an AUC of 73.4%).
In the binary classification models, the classifiers performed well for distinguishing between manic and depressive clinical states, with all AUCs exceeding 85% (see Figure 2).The Xgboost classifier emerged as the top performer, achieving an overall AUC of 88.54%, a sensitivity of 84.87% and a specificity of 73.78% in the holdout testing dataset.For distinguishing between BD mixed and depressive clinical states, the Xgboost classifier again emerged as the top performer, achieving an overall AUC of 74.42%, a sensitivity of 83.33% and a specificity of 53.43%.However, in distinguishing between BD manic and mixed clinical states, performance was more variable due to the unbalanced nature of the dataset.The best performance was using the LR classifier, achieving an overall AUC of 76.18%, a sensitivity of 86.21% and a specificity of 54.43%.The classification performance was found to be closely associated with the method of data handling.Similar to the multiclass classification, we also evaluated the performance of re-sampling binary classification (Figures 3, 4).While the performance of Xgboost classifier did not improve using oversampling, the other three classifiers all showed improvement.Specifically, the SVM classifier's AUC increased from 66.17% to 73.37% for distinguishing between BD manic and mixed clinical states, and from 60.82% to 73.45% for distinguishing between BD depressive and mixed clinical states.

Important features
When considering the contribution of each of the 94 features in the three binary classification models, the feature importance results of three Xgboost models were shown in Figures 5-7.The models included continuous, categorical, and binary features.Continuous and categorical features vary from low to high values, whereas binary features are either present or absent.In Figure 5, each dot represents the impact of a feature on the prediction of BD manic or depressive episodes for one patient in the training set.To be more specific, dots to the right (a SHAP value >0) mean that patients with feature values contributed to a class "1" (BD-depression) decision whereas dots to the left (a SHAP value ≤0) mean that patients with feature values contributed to a class "0" (BD-mania) decision.The color of each dot represents the feature value, with more purple dots indicating higher values and yellower dots indicating lower values.The numerical values next to each feature on the vertical axis represent the mean absolute value of the SHAP values, indicating the relative importance of each feature's contribution to the predicted value.Larger values indicate a wider distribution of SHAP values.The impact of each feature on the BD depressive or mixed state prediction and the impact of each feature on the BD manic or mixed state prediction were similarly shown in Figures 6, 7 respectively, and dots to the right both mean prediction of BD mixed state.Finally, Supplementary Figures S1-S3 present the relative importance of all features on predictions for the holdout test dataset at the individual patient level.
It should be noted that the presence of talkativeness and elevation of mood drive predictions towards a manic state, while the absence of these symptoms influences predictions towards a depressive state.This is consistent with traditional clinical diagnostic criteria, which consider elevation of mood and talkativeness as key symptoms distinguishing mania from depression.In addition to symptom markers, the age at admission significantly contributed to predictions, with older age predicting BD-depression and younger age predicting BD-mania, reflecting the age distribution of BD episodes.Notably, comorbidity conditions and job status also had a significant impact.Biological laboratory test indicators and vital signs were also identified as important features, ranking among the top features overall, including myocardial enzyme markers (creatine kinase, hydroxybutyrate dehydrogenase, lactate dehydrogenase), non-enzymatic antioxidant uric acid, liver biochemistry markers (albumin), vital signs (pulse), serum metabolism markers (cholesterol, creatinine, low/high density lipoprotein, glucose), markers of inflammation (percentage of monocyte), serum inorganic phosphorous, and red blood cell markers (hemoglobin), etc., suggesting the overall health of other tissues or organs is closely corresponds to the pathophysiological mechanisms of BD.
In contrast, symptom markers are not able to predominantly contribute to the classification of BD mixed and depressive (manic) clinical states due to overlapping neuropsychological symptomatology; instead, biological markers contribute the most in such scenarios.Particularly, in the classification of BD mixed The bold values presented in Table 2 indicate the features of patients, as well as the p-value results obtained from hypothesis tests.
and depressive episodes (Figure 6), creatine kinase, average red blood cell HGB concentration, albumin and aspartate aminotransferase were identified as the most significant biological markers.Additionally, the only symptom marker among the top 5 contributing features was a lowering of mood (depression-related symptoms are less typical in mixed episode than in depressive episode).In the classification of BD mixed and manic clinical states (Figure 7), hydroxybutyrate dehydrogenase, cholesterol and high density lipoprotein were identified as the most contributed biological markers.Furthermore, elevation of mood and talkativeness were two symptom markers among the 5 top contributing features (mania-related symptoms are less typical in mixed episode than in manic episode).

Feature interactions
We further provide a visual example of how the top 24 features in each model interact (Figures 8-10).The figures also show the varying trend of the SHAP value of each feature.In each figure, the X-axis The bold values presented in Table 3 indicate the machine learning methods and their performance metrics.
represents the feature values, and the Y-axis represents the SHAP value of the specific feature on the X-axis.The color represents the value of the feature that interacts with the specific feature on the X-axis; the more purple the dot, the higher the interacted feature value, while the yellower the dot, the lower the interacted feature value.In Figure 8, when considering the continuous features like creatine kinase, hydroxybutyrate dehydrogenase, and uric acid, the SHAP value (relative risk of BD-depression prediction) first decreases with increasing values of these three features and then stabilizes gradually.Additionally, age, albumin, pulse, and lactate dehydrogenase exhibit clear threshold effects.For patients under 35 years old, BD-mania prediction is the main risk, which first increases with age and then decreases regardless of whether there is an elevation of mood symptom.For patients over 35 years old, BD-depression prediction is the main risk, and absence of the elevation of mood symptom has a higher relative risk than patients with such symptoms.The risk increases smoothly with age regardless of whether there is the elevation of mood symptom.finding suggested that laboratory biological markers can indeed reflect the biological differences of different BD clinical states, while controlling the impact of individual confounders as much as possible.Moreover, additional validation in ML models for predicting the longitudinal evolution of patients' clinical states was conducted.The results showed a test AUC of 0.559 for predicting the longitudinal conversion from BD depressive episodes to BD manic episodes, and a test AUC of 0.807 for predicting the longitudinal conversion from BD manic episodes to BD depressive episodes (Supplementary Information 1.2; Supplementary Figure S4).These findings suggested that neuropsychological symptomatology, comorbidities, vital signs, and blood laboratory measures could predict different BD clinical states in both cross-sectional and longitudinal study settings.However, additional key information should be included into the models to enhance the prediction of conversion from BD depressive episodes to BD manic episodes.Due to the small number of patients in BD mixed clinical state, we did not include an analysis of longitudinal evolution for this subset of patients.

Discussion
The present study investigated the possibility of distinguishing between BD manic, depressive, and mixed clinical states using multiple ML classifiers based on structured and unstructured EMR data.All patients were interviewed and diagnosed by a team of psychiatrists and psychologists in the studied hospital.We designed four machine learning algorithms (LR, SVM, RF, Xgboost) and evaluated their performance in non-resampling and resampling multiclass/binary classifications due to the imbalance nature of different clinical states (especially the mixed state).
In non-resampling multiclass classification, the Xgboost classifier emerged as the best performer, achieving an overall accuracy of ROC curves of four machine learning models for distinguishing BD manic and depressive clinical states.ROC is the abbreviation of area under the receiver operating characteristics.BD is the abbreviation of bipolar disorder.SVM is the abbreviation of support vector machine.LR is the abbreviation of logistic regression.xgboost is the abbreviation of eXtreme gradient boosting.RF is the abbreviation of random forests.79.90% and an AUC of 79.5% in the holdout testing dataset.However, it correctly identified only one of 29 subjects diagnosed with mixed state.In the resampling setting, the LR model emerged as the most effective classifier for detecting mixed states, with a detection rate of 44.8%.When considering binary classification models for distinguishing between manic and depressive states, most classifiers performed well, with all AUCs exceeding 85%.The Xgboost model performed the best achieving an overall AUC of 88.54%.In non-resampling scenarios, Xgboost also excelled at distinguishing between mixed and depressive states, with an AUC of 77.42%, while LR emerged as the top performer for distinguishing between mixed and manic states, with an AUC of 76.18%.These classification models were trained using information that can be easily collected from patients prior to hospitalization.With sufficient data on a reasonable number of patients, these algorithms could serve as an additional tool in mental health services to direct BD diagnosis and predict the course of illness in a data-driven manner.However, it is important to note that these models were not designed to replace proper clinical evaluation.To the best of our knowledge, this is the first comprehensive dissection of clinical and biological heterogeneity in BD clinical states, considering overlapping neuropsychological symptomatology, vital signs, comorbidity, and blood laboratory indicators.We used these features to predict BD clinical states, and our results were highly comparable with previous studies.Overall, we found that the different BD clinical states have distinct profiles of specific episoderelated effects.
In this study, it was found that BD is more prevalent in women than in men, especially the BD mixed state.Among individuals diagnosed with BD mania, a gender difference was observed with more males than females in this state, whereas more females than males were found among individuals diagnosed with BD depression or mixed state.Men tend to exhibit hyperactivity, grandiosity, and engage in risky behavior, while women tend to report more racing thoughts and distractibility (18).Levels of mood-related depression symptoms (i.e., lowering of mood, mood instability) were similar in depressed and mixed states, and both ROC curves of four machine learning models for distinguishing BD mixed and depressive clinical states.ROC is the abbreviation of area under the receiver operating characteristics.BD is the abbreviation of bipolar disorder.SVM is the abbreviation of support vector machine.LR is the abbreviation of logistic regression.xgboost is the abbreviation of eXtreme gradient boosting.RF is the abbreviation of random forests.Differences in suicidality and psychomotor abnormalities (i.e., appetite disturbances) were observed with the mixed state showing more severe symptoms than depressed and manic states, while depressed subjects presented with more somatic symptoms (i.e., pain) than those with the mixed and manic states.No statistically significant differences in psychotic symptoms were found among the three BD clinical states.Reported rates of comorbidity ranged from 0.4% to 14.7%.The depressed state presented with more comorbid conditions than the manic and mixed states, including the psychiatric, endocrine, digestive, and circulatory systems.The number of psychiatric comorbidities was similar in depressed and mixed states, and both states have significantly more psychiatric comorbid conditions than manic state.
In current clinical practice settings, patients' mood is commonly assessed through clinician-administered rating scales and questionnaires, physiological and biological parameters are not used for this purpose.However, recent findings suggest that these parameters may be sensitive to clinical states and could serve as predictors of mood state changes.Particularly, higher levels of body temperature, pulse, and systolic and diastolic blood pressures were observed during mania in BD, which may be associated with increased energy, lability, and irritability.When comparing the patterns with respect to the BD mania/depression classification and the BD mania/mixed state classification, we found that talkativeness, elevation of mood, hydroxybutyrate dehydrogenase, ROC curves of four machine learning models for distinguishing BD manic and mixed clinical states.ROC is the abbreviation of area under the receiver operating characteristics.BD is the abbreviation of bipolar disorder.SVM is the abbreviation of support vector machine.LR is the abbreviation of logistic regression.xgboost is the abbreviation of eXtreme gradient boosting.RF is the abbreviation of random forests.The impact of the input features on predictions of Xgboost in binary classification between BD manic and depressive clinical states.
uric acid, pulse, lactate dehydrogenase, cholesterol and glucose contributed most in both models.In BD manic episodes, mean value of cholesterol was lower than in other two states, while mean values of the remaining markers were higher than in other two states, indicating that clinical and biological heterogeneity in BD manic episodes can be further understood through these markers.This finding, that uric acid levels are higher in BD manic episodes compared to BD depressive episodes, has also been reported in reference (23).
When comparing the patterns with respect to the BD depression/mixed state classification and the BD mania/mixed state classification, we found that average red blood cell HGB concentration, aspartate aminotransferase, pulse, platelet count, serum inorganic phosphorus, percentage of monocyte, high density lipoprotein, glucose and diastolic blood pressure contributed most to both models.In BD mixed episodes, the mean values of average red blood cell HGB concentration, aspartate aminotransferase, pulse, platelet count, and glucose were lower than in other two states, while mean values of serum inorganic phosphorus, percentage of monocyte, high density lipoprotein, and diastolic blood pressure were higher than in other two states.By monitoring these markers, physicians can better identify patients in BD mixed episodes.When comparing the patterns with respect to the BD depression/mixed state classification and the BD mania/depression classification, we found that creatine kinase, lowering of mood, albumin, cholesterol, age, hemoglobin, percentage of monocyte contributed most in both models.In BD depressive episodes, mean values of creatine kinase, lowering of mood, cholesterol, age, and hemoglobin were higher than in other two states, while mean values of albumin and percentage of monocyte were lower than in other two states (see Figure 11).These results suggest that physiological and biological parameters The impact of the input features on predictions of Xgboost in binary classification between BD mixed and depressive clinical states.may serve as potential biomarkers for differentiating BD manic, depressive, and mixed clinical states.To the best of our knowledge, this is the first systematic evidence of these patterns of difference which clinicians and investigators alike may be able to utilize to aid both in better diagnosing BD mood states and, by extension, illness course prediction and establishing and implementing treatment plans.Notably, the myocardial enzyme spectrum is a biomarker for diagnosing cardiovascular diseases and also contributed most to our classification models.Elevated concentrations of myocardial enzymes in both BD and cardiovascular diseases may play an important role in identifying the etiology and pathogenesis of BD.Evidence suggests that BD is associated with a higher risk of the co-occurrence with cardiovascular diseases, which could be due to genetic (37) or biological alterations, including immune-inflammatory pathways and oxidative stress pathways that are closely related (38).It is acknowledged that excessive oxidative stress among BD patients exists (22), with an imbalance between oxidant and antioxidant species (such as the antioxidant enzymes catalase and the non-enzymatic antioxidant uric acid, bilirubin).Our results also suggest that uric acid and direct bilirubin contributed significantly to distinguishing between BD clinical states both in the cross-sectional and longitudinal study settings.Although many works point to alterations in antioxidant enzymes among patients with BD, however, no study reported the relationship between the myocardial enzymes and BD episodes according to the polarity of individuals with BD.Because there is no single, reliable biomarker, future prospective studies to validate a novel possibility for myocardial enzymes being used for diagnosis of BD are needed.
Additionally, we found that features interact in complex ways, and the ambiguity of these features further emphasizes the need for machine learning-based diagnostic decision support.Although the models evaluate the contributions of individual features differently, it is a combination of several features that provides specific classification performance.We observed that the features interact in a complex non-linear manner, such as the compensated interaction effect between elevation of mood and lowering of mood symptoms, and the threshold effect between continuous and binary variables (i.e., age with elevation of mood, albumin with talkativeness, pulse with talkativeness and lactate dehydrogenase with talkativeness) when distinguishing between manic and depressive states.The most important features varied across different classification scenarios (manic/mixed states, depressive/mixed states, and manic/ depressive states), and were consistent among different models trained in the same scenario.This reinforces the application of ML models to reveal hidden layers of information in clinical data collected from patients.

Limitations
There are several limitations of this study.Firstly, the entire study was conducted retrospectively using a single and imbalanced sample collected from a large hospital's EMR data.There is a high probability of selection bias, which could hamper the generalizability of the ML algorithms trained in this dataset, despite all appropriate procedures being followed to prevent overfitting.External validation based on a larger sample size and multi-site samples could have reduced the uncertainty of some of our models, particularly regarding the performance of models resampling the positive and negative classes.Secondary, it should be emphasized that discharge diagnoses were used as the outcome labels of our classification models, and patients' information collected prior to admission formed the basis of the predictors for the classification models.In this research setting, the utility and representativeness are diminished by limiting the sample to admitted patients, so we are only considering severe bipolar disorder.Although the chief complaint data was analyzed by NLP technology to extract patients' symptoms and current mood states, clinical severity of episodes/symptoms was not considered due to the limitations of the EMR data itself.Thirdly, current nosology systems in psychiatry have limitations and were not designed to predict risk of future mood episodes.The impact of the input features on predictions of Xgboost in binary classification between BD manic and mixed clinical states.
Frontiers in Psychiatry 22 frontiersin.orgClinically, we believe that the limitations suggest that our model approach would be insufficient for directly using as a clinical diagnosis tool without further investigation.However, we believe Interaction effects of some important features of Xgboost in binary classification between BD mixed and manic clinical states.

Conclusion
Using a longitudinal design that incorporated within-subject comparisons between clinical states, we investigated whether neuropsychological symptomatology, comorbidity, vital signs, and blood laboratory indicators can predict distinct BD states, specifically BD-mania, BD-depression, and BD-mixed state.We employed explainable machine learning techniques to analyze the data.Our findings contribute to a better understanding of the clinical, physiological, and biological heterogeneity among BD clinical states.We found evidence that specific combinations of features could serve as potential diagnostic markers for each clinical state of BD.Finally, larger studies are needed to map biological risk factors more precisely to clinical states.The identification of underlying heterogeneity across distinct BD clinical states is a well-recognized challenge.Nevertheless, such efforts are critical in helping classify psychiatric disorders more accurately and may contribute to psychiatric clinical classification systems with a more biologically informed nosological system.

FIGURE 3
FIGURE 3 were significantly more depressed than manic state.Moreover, manic, and mixed states differed in terms of mania severity, as demonstrated by mood-related mania symptoms (i.e., elevation of mood, talkativeness, and irritability), and both were significantly more manic than the depressed state.The finding is also reported by Singh et al. (21), in a small sample cross-sectional study.Anxiety-related symptom (i.e., worry) and sleep-related symptom (i.e., poor sleep) were similar in mixed and manic states, and symptoms of anxiety and bad sleep in both states were significantly less common as those in depressed state.Singh et al. (21) reported that anxiety symptoms were particularly severe in mixed episodes.

FIGURE 8
FIGURE 8Interaction effects of 24 top important features of Xgboost in binary classification between BD manic and depressive clinical states.

FIGURE 9
FIGURE 9Interaction effects of some important features of Xgboost in binary classification between BD mixed and depressive clinical states.

Table 1
summarizes the characteristics of vital signs based on basic body checks prior to hospitalization.Compared to patients

TABLE 1
Sociodemographic and clinical characteristics of the study population.

TABLE 1 (
Continued)The bold values presented in Table1indicate the features of patients, as well as the p-value results obtained from hypothesis tests.

TABLE 2
Routine laboratory test results of the study population.

TABLE 3
Multi-class classification performance of four algorithms.
Zhu et al. 10.3389/fpsyt.2023.1128862Frontiers in Psychiatry 25 frontiersin.orgthat because our model was constructed directly from EMR data, integrating it into an EMR-based systemwide clinical decision support program would be more practical than if the model were created using data that needed to be collected outside the EMR.