Biochemical and Endocrine Parameters for the Discrimination and Calibration of Bipolar Disorder or Major Depressive Disorder

Objectives Conventional biochemical indexes may have predictive values in clinical identification between bipolar disorder (BD) and major depressive disorder (MDD). Methods This study included 2,470 (BD/MDD = 1,333/1,137) hospitalized patients in Shanghai as training sets and 2,143 (BD/MDD = 955/1,188) in Hangzhou as test sets. A total of 35 clinical biochemical indexes were tested, including blood cells, immuno-inflammatory factors, liver enzymes, glycemic and lipid parameters, and thyroid and gonadal hormones. A stepwise analysis of a multivariable logistic regression was performed to build a predictive model to identify BD and MDD. Results Most of these biochemical indexes showed significant differences between BD and MDD groups, such as white blood cell (WBC) in the hematopoietic system, uric acid (UA) in immuno-inflammatory factors, direct bilirubin (DBIL) in liver function, lactic dehydrogenase (LDH) in enzymes, and fasting blood glucose (FBG) and low-density lipoprotein (LDL) in glucolipid metabolism (p-values < 0.05). With these predictors for discrimination, we observed the area under the curve (AUC) of the predictive model to distinguish between BD and MDD to be 0.772 among men and 0.793 among women, with the largest AUC of 0.848 in the luteal phase of women. The χ2 values of internal and external validation for male and female datasets were 2.651/10.264 and 10.873/6.822 (p-values < 0.05), respectively. The AUCs of the test sets were 0.696 for males and 0.707 for females. Conclusion Discrimination and calibration were satisfactory, with fair-to-good diagnostic accuracy and external calibration capability in the final prediction models. Female patients may have a higher differentiability with a conventional biochemical index than male patients. Trial Registration ICTRP NCT03949218. Registered on 20 November 2018. Retrospectively registered. https://www.clinicaltrials.gov/ct2/show/NCT03949218?id=NCT03949218&rank=1.


INTRODUCTION
The National Depressive and Manic-Depressive Association investigated the diagnosis and treatment conditions of bipolar disorder (BD) in 2000, and the results indicated that 69% of cases were misdiagnosed (1). A mean of four psychiatrists were consulted by a patient with BD before receiving the accurate diagnosis, with over one-third taking 10 years before being correctly diagnosed (1). The proportion of BD misdiagnosed as major depressive disorder (MDD) in clinical practice was reported as 20.8% nationwide in China (2). The long-term consistency rate of MDD cases across a 10-year study was only 45.5% (3) compared with 71.9% in a 7-year cohort study of BD cases (4). The misdiagnosis rate has not substantially decreased over the past nearly 20 years.
Mitochondrial malfunction, inflammatory cytokines, and activated microglia were pointed out as potential biomarkers of BD from the perspective of oxidative stress and neurogenic inflammation (5)(6)(7). The oxidative stress of mood disorders may lead to inflammatory changes in multisystem function, which may be detected by laboratory examination (8,9). Downstream parameters of inflammatory response are likely to be more available for clinical use in routine blood testing using peripheral blood (10).
After that thought, we extracted our database and found 14 inflammatory markers of patients with mood disorders having different elevated levels (11), such as uric acid (UA), direct bilirubin (DBIL), and lactic dehydrogenase (LDH). The datadriven analytics based on inflammation have not yet been proven between BD and MDD. Thus, we speculated that biochemical and endocrine parameters deserve potential development (12).
To help medical professionals, discovering an informative diagnostic tool to improve the uniformity of clinical judgment, we have created new parametric prediction models for discriminating between BD and MDD based on conventional biochemical indexes and hormones. Due to the different range of biochemical parameters between sexes, sex difference is a strong heterogeneous factor that needs to be fully considered (13). Therefore, a male model and a female model at stage 1 and three submodels for different phases of the menstrual cycle of female patients at stage 2 were built to make the precise analysis. In this study, the effectiveness of our model is discussed.

Trial Design
This study was a retrospective, cross-sectional, and real-world study. We registered as a clinical trial (No. NCT03949218) at the International Clinical Trials Registry Platform and obtained ethical approval 2019-15R from the Shanghai Mental Health Center (SMHC). Informed consent restricts the use of the material to scientific research purposes only, according to the Declaration of Helsinki, and permission to take informed consent is formally waived by the approving committee.
All data were gathered by the Information Department of SMHC and Hangzhou Seventh People's Hospital (HSPH). Information engineers developed the Hospital Information System (HIS) for searching of clinical big data based on the code of ICD-10 criteria, laboratory database, medical examination database, and drug database (Supplementary Material 1).
The patients' personal identifiable information was redacted to protect patient privacy and identity before being provided for analysis.

Participants
The population-based sample used for this report included 4,647 hospitalized mood disorder patients from January 2009 to December 2018 in SMHC and 3,029 from January 2017 to December 2019 in HSPH. The mental examination was conducted by a three-level ward round, including at least one chief physician. We extracted the diagnostic code of ICD-10 from the discharge abstract of each patient. The biochemical data and electronic medical records from the HIS are available. Predictors considered were age, age at onset, disease duration, sex and clinical biochemical data at admission regarding the hematopoietic system, immuno-inflammatory indexes, liver function, glucolipidmetabolism, thyroid function, and sex hormones (11). The 35 indexes and their normal ranges are given in Supplementary Material 2.
We collected fasting venous blood between 6:00 and 8:00 a.m. using a set of standard operating procedures at the nurses' workstation. The inpatients had neither tobacco use nor alcohol consumption at least 12 h before the blood specimen collection. An electrochemical luminescence immunoassay (ECLIA) was performed using the Roche Cobas e601 automatic electrochemiluminescence immunoassay system (14), provided by SMHC and HSPH. After the blood test, clinical medication and dosage adjustments would be arranged by a doctor-in-charge starting at 8 a.m. each day.
Inclusion criteria were as follows: 1. diagnosed BD, MDD, or their subtypes according to the ICD-10; 2. available biochemical data in the HIS; and 3. hospitalized patients with no substancerelated or addictive disorders. Exclusion criteria were as follows: 1. comorbidity with other mental disorders; 2. pregnancy or postpartum lactation; 3. severe physical illness; and 4. indefinitive menstrual history for female patients.
For this study, we continued to screen the database and excluded the following patients to focus on patients of reproductive age: 1. Age <16 or testosterone of male < 8.64 nmol/L or pre-menarche female; 2. age > 55 or FSH of female > 25.8 IU/L or menopause female; 3. hormone data missing; 4. hospital readmission. With the exclusion of the above cases, a total of 2,470 patients were selected (BD = 1,333, MDD = 1,137) as training data sets. The median observation period was 5 years and it was taken to convert the diagnosis of BD by 27.3 ± 22.4 months from the first hospitalization of the BD-converters. We actually identified 64 BD-converters from patients with MDD, who were counted in the BD group. Among the 3,029 patients in HSPH, we excluded 886 patients [(age < 16 or testosterone of male < 8.64 nmol/L or pre-menarche female, n = 406) and (age > 55 or menopause female, n = 480)]. Finally, there were 2,143 patients matched (BD = 955, MDD = 1,188) as testing datasets. To ensure the accuracy and reliability of the grouping by sex steroids, we randomly selected 1% of the cases from female patients to check their menstrual histories and confirmed that the results were consistent with what would be expected given the patients' menstrual phase. We divided our research into two stages (refer to Figure 1 for a flow diagram of sample selection) (11).

Outcome
Many indexes need to be discussed separately for different normal ranges due to gender differences. Therefore, the two diseases were classified into gender subgroups if necessary.
Outcomes of interest were 35 indexes in five domains: hematopoietic system (WBC, neutrophil, RBC, hemoglobin, and

Predictors
In the session of correlation analysis, we considered these variables that may yield potential collinearity to avoid affecting the final results. Before building the regression model, the parameters were set to be excluded, provided that the potential multicollinearity was above five of the variance inflation factor. Based on the principle, we removed neutrophil, CRP, hemoglobin, TBIL, GOT, TCH, albumin, globulin, TT4 and TT3. The results showed no multicollinearity between the 25 remaining independent variables of the gender subgroups in Supplementary Material 3. Then, the remaining data were extracted for each patient. In the case of repeated tests, the first report of blood examination of each patient during hospitalization was used.

Missing Data
To ensure reliability of the data, we excluded patients (n = 350) who had plenty of data loss (≥30% estimated data). In this study, we continued to screen the database and excluded patients with hormone data missing. The hormone data must include TSH, FT3, FT4, FSH, testosterone in men, and progestin in women. Patients were excluded from the development of models if they had no information on any of the prediction parameters.

Statistical Analysis
Statistical analyses were performed using SPSS version 22.0. Comparisons of the candidate biochemical indices between BD and MDD were performed via Student's t-test (normally distributed data) or rank sum test (skewed distribution data), as appropriate. All tests were two-tailed, and statistical significance was defined as a p-value < 0.05. The effect sizes were employed by the Cohen's d or OR. For identification of the biochemical index between BD and MDD, a forward conditional selection of multivariable logistic regression was performed for training sets, with p-value criteria of 0.05 for entry and 0.10 for removal of variables. Hierarchical multiple regression was performed for the test sets using the same parameters. The diagnostic categorical variables were listed as 1 = BD and 2 = MDD. We removed the variables that could cause potential multicollinearity before constructing a conditional forward pattern of the model. The chosen predictive factors were based on previous high-quality studies, reviews, and meta-analyses (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33). Then, a receiver operating characteristic (ROC) curve was plotted via the probabilities of the predicted values to determine the predictable validity of the logistic regression model. The capability of the prediction models was calibrated via Hosmer-Lemeshow goodness-of-fit test. The AUCs and standard errors of the ROC curve were entered into MedCalc version 19.6.1 z test to compare the independent samples.

Regression Analysis of Distinguishing BD From MDD Patients by Biochemical Indexes
We tried to further associate BD and MDD with the potential risk factors, including biochemical indexes and thyroid hormones. We analyzed the subgroup of male patients first and then female patients. There were 954 male (536 vs. 418) and 1,060 female (540 vs. 520) patients who were included in the first step analysis; 217 and 239 cases missing at least one index were excluded, respectively. The regression models were finally constructed, while the other variables did not enter the models ( Table 2). Figures 2A,B showed fair diagnostic accuracies of the models by gender, where the area under the curve (AUC) of the ROC were 0.772 (95% CI: 0.743-0.802) in males and 0.793 (95% CI: 0.767-0.820) in females, and the best cutoff values (Youden index) were 0.406 and 0.426, as shown in Table 3. We also obtained two balanced algorithm accuracies of 70.3% in males and 71.3% in females. There was no difference in the AUC between genders (z = 1.058, p = 0.290).

Regression Analysis of Distinguishing Female BD From MDD Patients by Biochemical Indexes Subgrouped by Different Menstrual Cycles (Follicular, Periovulatory, and Luteal Phase)
In addition, we grouped female patients by their three physiological periods, as shown in Figure 1 (stage 2). There were 613 (293 vs. 320), 154 (85 vs. 69), and 293 (162 vs. 131) female patients who were included in the analysis, respectively, and the rest of the cases (139, 42, and 58) were excluded, respectively, for missing at least one of the indexes. Three regression models were constructed, as shown in Table 4.
The ROC curves in Figure 2B, b1-3 displayed the good diagnostic accuracy of these models, where the AUCs of the ROC were 0.777-0.848 and the Youden indexes were 0.413-0.571. Internal validation on the training sets, the results of the chi-square value (χ 2 ) of the Hosmer-Lemeshow test were 2.651 for males, 10.264 for females, and 5.875-11.840 for females with different menstrual cycles. There were no differences in the AUC between the follicular phase and the periovulatory phase (z = 0.273, p = 0.785) nor between the periovulatory phase and the luteal phase (z = 1.422, p = 0.155). However, the difference between the periovulatory phase and the luteal phase was statistically significant (z = 2.498, p = 0.013).
The χ 2 values of internal and external validation for male and female datasets were 2.651/10.264 and 10.873/6.822 (p-values < 0.05), respectively. The AUCs of the test sets were 0.696 (95% CI: 0.659-0.732) for males and 0.707 (95% CI: 0.678-0.735) for females. The differences in the AUC between training sets and test sets of gender subgroups were statistically significant (males, z = 3.244, p = 0.001; females, z = 3.839, p = 0.001). There were no statistical differences between the expected and the observed values for the five models (p-values > 0.05). The predictive models showed good calibration capability, as shown in Table 4.
Finally, we summarized the significant positive or negative correlation coefficients (p-values < 0.05) in those regression models for qualitative analysis. These predictors yielded completely consistent performance if more than one predictor was marked among the five groups in the top 10 rows, as shown in Supplementary Material 4. The above regression equation indicated that the DBIL, LDL, GGT, ALP, GPT, prealbumin, HDL, and triglyceride levels were correlated positively with the MDD diagnosis, while the LDH, UA, TSH, FT3, IDBIL, WBC, FBG, and FT4 levels were correlated negatively (p-values < 0.05).

DISCUSSION
Many potential differentiators between the mood disorders were noticed in stage 1. After that, we started a more in-depth correlation analysis. The levels of four sex steroids (FSH, LH, estradiol, and progesterone), which fluctuate with menstrual cycles, were not compared between females. In consideration of the fluctuation of sex steroids in women's menstrual cycles, we directly put them into stage 2 for further investigation.
A total of five ROCs showed good diagnostic accuracy of these models, where the AUCs were 0.772-0.848. Relatively speaking, the prediction model of women at luteal phase  had the highest accuracy, while at follicular phase, it showed the lowest. A relevant study found that severe premenstrual (luteal phase) syndrome is frequently confused with BD (28), and the problem would be well-explained by our database if predicted a few days before menstruation. The AUC of the male model could be regarded as the baseline data, which is similar to the female model at the baseline follicular phase (0.772 vs. 0.777) (34). Therefore, the menstrual cycle may play an important role in distinguishing between the diseases for female patients (13).
Oxidative stress plays potential roles in follicular development, resumption of oocyte meiosis, and steroidogenesis (35)(36)(37)(38)(39). Chronic psychosocial stress may affect ovarian function. The role of oxidative stress in medications induces reproductive toxicity as well. Usual menstrual cycle lengths are about 27-29 days in which the follicular and luteal phases take fifty-fifty (40). However, after being grouped by progesterone level and taking out the debatable cases with uncertainty of menstrual phase when we called pre-or post-ovulatory phase (n = 112 + 84), we found the rate of female patients with BD in the luteal stage was higher than that of MDD (29.2 vs. 24.7%). The predictive impact of progesterone levels suggests that ovarian follicle development may be suppressed in mood disorders. Compared to each other, follicular maturation in BD is more close to normality than in MDD (about 10% of normal females with no LH surge in the real world) (41). Otherwise, it can be thought that patients with MDD were about 3 years older than patients with BD, affecting the menstrual cycle. Although menopausal women are more likely to have immature follicles, the hormone screening in this study excluded most patients with ovarian function failure (FSH > 25.8 IU/L). We also noticed that even when all disputable cases were added together in the luteal phase, the rate would have been still far from 50%. The progesterone level suggests that ovarian follicle development may be suppressed under psychological and mood disturbances (42,43). Mood disorders, both in the bipolar and unipolar spectrum, may be associated with decreased fertility rates. We predict patients with MDD may have even poorer potential fecundability. As external interferences on the mature and release of ovarian follicles are irreversible after a LH surge, this difference may explain the luteal model's best diagnostic accuracy. Another possible reason is that these selected patients might not have taken medications before hospitalization, thereby avoiding the medication treatment effect on reproductive function (43). On the contrary, long-term valproate therapy is associated with the development of menstrual disturbances and alterations in the reproductive endocrine system (44) while there has been no evidence of reproductive toxicity in female patients on antidepressant agents (38). As it is not ethical to study the side effects of antidepressants on the reproductive endocrine system in humans, we can only observe them in some inconsecutive reports. Our result is likely to be credible.
The AUC of the periovulation model (transitional model) fell within a place between the two categories above. Indeed, periovulation is not a very precise title as the definition of the ovulation phase is 5 days of preovulation and 4 days of postovulation requiring B-ultrasonography for prospective determination of ovulation (45). Our settings were prior to avoid overlapping of progesterone levels (follicular phase 0.181-2.84 nmol/L vs. luteal phase 5.82-75.9 nmol/L). The ovulation day varies considerably for any given menstrual cycle length (41). To ensure the accuracy of clinical data, we narrowed the reference range of the transitional model from the maximum progesterone level in the follicular phase to the minimum in the luteal phase, which meant more accurate cases were included in the periovulation phase (2.84-5.82 nmol/L) than the expanded reference range (0.385-38.1 nmol/L). The settings are closer to the real conditions.
The difference in the indexes of the hematopoietic system, immuno-inflammatory factors, liver enzymes, glucolipid metabolism, and thyroid and gonadal hormones between BD and MDD will be further discussed. Many previous studies have investigated these biomarker discriminations of BD vs. MDD.
In the hematopoietic system, we found that all of the five indexes had differences between BD and MDD except for RBC and hemoglobin differences, which were found in female patients only. Current evidences show that increased inflammatory cells (both neutrophils and platelets) play an important role in the pathophysiology of BD-mania and the euthymic state (22,46). A significantly higher percentage of patients with BD have an abnormal (too low or too high) number of platelets compared to unipolar depression, while MDD does not differ in platelet level. The platelets of patients who have MDD with psychotic features are higher than those of patients with other types of depression (47). The differences in RBC and hemoglobin in female patients may result from the rate of follicle maturation related to menstrual bleeding (48).
Four immuno-inflammatory factors are potential indicators [CRP (18), ESR (31), UA (26), and prealbumin (19)] as each of their concentrations are increased in mood disorders. We call them immuno-inflammatory factors because they do not belong to either of the body's organs and systems.
All of those indicators were statistically significant (p-values < 0.05), except the ESR in male patients. Various evidences show that CRP (16) and UA (15) levels in BD are increased compared to MDD, while ESR levels are elevated in MDD (30). However, there was no evidence of the usefulness of ESR as an indicator of BD, which limits its predictive usefulness. Although female patients showed more susceptibility to oxidative stress than male patients, these gender-based differences did not seem to provide a biochemical basis for the epidemiological differences (49). Purinergic signaling is involved in the physiology of neurotransmission and neuromodulation (50). MDD might have reduced levels of antioxidant UA (51) while increased levels of antioxidant UA mean accelerated purinergic transformation and/or decreased adenosinergic transmission in BD (52). The purinergic system could prove a promising path for the search of biomarkers in BD.
We grouped liver function and glucolipidmetabolism together as hepatic function is closely related to the cell's metabolism of sugar, fat, and protein under the condition of oxidative stress (20). DBIL, IDBIL, GPT, GOT, and LDH, the commonly used liver function indexes, displayed a series of group differences. Liver oxidative injuries were found in MDD (32) but not in BD. We corroborated the more severe liver-brain interactions in BD under the condition that hepatotoxicity was proved to be unrelated to adverse hepatic events during maintenance mood stabilizers (53). In the previous biochemistry, bilirubin had been regarded as a useless metabolite of heme, and its abnormal rise was only used as a laboratory reference for the diagnosis of jaundice. After its potential antioxidant activity was discovered, we had a more comprehensive understanding of the antioxidative effects of bilirubin (54). It is currently believed that bilirubin is a natural endogenous antioxidant with strong antioxidant activity and mediates oxidative stress (55). We posit that BD may have more severe oxidative stress injury than MDD. Antioxygenation consumes endogenous bilirubin, leading to a decrease of DBIL (water-soluble bilirubin) and a compensatory increase of IDBIL (lipid-soluble bilirubin) by approaches of cellular homeostasis in patients with BD. Antioxidant levels may explicate different changes in bilirubin levels between groups. Metabolic components are significantly associated with BD and MDD in a current depressive episode (25), and impaired glucose metabolism presents a higher ratio of manic/hypomanic than depressive episodes (23). For both sexes, possibly due to the impact on energy metabolism of mood disorders, the FBG level was higher while the triglyceride level was lower when comparing BD to MDD. After the former studies showing liver-brain interactions due to illness vs. health (56), we further figured out some different metabolic characteristics within mood disorders.
We then measured the HPT axis overall and sex steroids subgrouped by sex, especially in women. We found that the TSH, FT4, TT3, and FT3 secretion in HPT axis differed between groups, where the TSH secretion differed only in the male subgroup. We classified by sex in prolactin and testosterone because they are seldom influenced by LH/FSH cycles, and both of them still showed statistical significance in the subgroups. Decreased testosterone secretion was more common in MDD than BD. The investigation of the hypothalamic-pituitarygonad (HPG) axis yielded that LH and estradiol secretion differed in the male subgroups. Dysfunction of HPT and HPA axes may lead to the different pathophysiology of mood disorders (17,24,27).
Prealbumin, also known as transthyretin, is a thyroid hormone-binding protein synthesized by choroid plexus and secreted into cerebrospinal fluid. We therefore put the prealbumin and HPT axis together to discuss. Low levels of prealbumin were identified in BD (21) and MDD (29) in previous studies. The low cerebrospinal fluid prealbumin levels were replicated in depression and bridge the gap between thyroid axis dysfunction and suicidal behavior in MDD (β = −0.58, p < 0.05) (17,57,58). We found that the prealbumin level in BD was even lower than in MDD, indicating a more severe HPT dysfunction in BD. On the contrary, the incidences of low thyroid hormone secretion are significantly greater in MDD than in BD. The HPT dysfunction was replicated by our study, especially FT3 and FT4 as the markers for prediction. We found that FT3 and FT4 in males and FT4 in females at follicular phase may be responsible for the identification of the illnesses.

CONCLUSION
We unified the criteria for fixed entry and removal of the regression model, and therefore, the risk factors of each prediction model were not the same. Even so, we could see that the correlations of qualitative analysis of the most robust variables were absolutely consistent across groups.
We used the training sets to predict the data of the independent external test sets and conducted external verification of the models. The effectiveness of the result indicates that our model has fairly good extrapolation application capabilities. The prediction models have good internal and external calibration capability via the goodness-of-fit test. Moreover, the diagnostic effect of the model in the test sets is equivalent to that of the training sets, indicating that the model has good applicability (generally the AUC is in the range of 0.70-0.80, and our models fall within this range). The difference may result from regional disparity in diagnosis.

LIMITATIONS
There are some limitations. First, we recorded a patient's blood test when the first definitive diagnosis was given before a standardized treatment procedure was started. However, we could not guarantee every patient was medicine-naïve even on the first hospitalization. Since the data lasted for 10 years with no limitation of drug use, almost every mood stabilizer or antidepressant agent could be on the list of relevant confounding factors. Additionally, medication treatment was reported to have adverse endocrine, hepatic, and metabolic events (39,53). Although the short-term impacts of medication, tobacco use (59), and alcohol consumption (60) can be eliminated by a set of operating procedures, the long-term impacts of these confounders on peripheral biomarkers were not confirmed in this study. Moreover, we only obtained the diagnosis codes rather than comprehensive quantitative assessment by clinical scales or subdivided the specific clinical symptoms described in a psychiatric interview, even though the three-level ward round was rigorously operated. The analysis of big data is still ongoing, and we are trying to export meaningful clinical data in the future.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from SMHC. Restrictions apply to the availability of these data, which were used under license for this study. Further enquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethical approval 2019-15R in Shanghai Mental Health Center (SMHC). Permission to take informed consent is formally waived by the approving committee.

AUTHOR CONTRIBUTIONS
YZ and ZN designed the study, collected, and analyzed the data. YZ and HJ wrote the first draft of the manuscript. HJ, HL, XW, LY, and JC contributed to data collection and statistical analysis. ZW and JC discussed and commented on the manuscript. YF reviewed and edited the manuscript. All authors read and approved the manuscript.