Efficient Prediction of Vitamin B Deficiencies via Machine-Learning Using Routine Blood Test Results in Patients With Intense Psychiatric Episode

Background Vitamin B deficiency is common worldwide and may lead to psychiatric symptoms; however, vitamin B deficiency epidemiology in patients with intense psychiatric episode has rarely been examined. Moreover, vitamin deficiency testing is costly and time-consuming, which has hampered effectively ruling out vitamin deficiency-induced intense psychiatric symptoms. In this study, we aimed to clarify the epidemiology of these deficiencies and efficiently predict them using machine-learning models from patient characteristics and routine blood test results that can be obtained within one hour. Methods We reviewed 497 consecutive patients, who are deemed to be at imminent risk of seriously harming themselves or others, over a period of 2 years in a single psychiatric tertiary-care center. Machine-learning models (k-nearest neighbors, logistic regression, support vector machine, and random forest) were trained to predict each deficiency from age, sex, and 29 routine blood test results gathered in the period from September 2015 to December 2016. The models were validated using a dataset collected from January 2017 through August 2017. Results We found that 112 (22.5%), 80 (16.1%), and 72 (14.5%) patients had vitamin B1, vitamin B12, and folate (vitamin B9) deficiency, respectively. Further, the machine-learning models were well generalized to predict deficiency in the future unseen data, especially using random forest; areas under the receiver operating characteristic curves for the validation dataset (i.e., the dataset not used for training the models) were 0.716, 0.599, and 0.796, respectively. The Gini importance of these vitamins provided further evidence of a relationship between these vitamins and the complete blood count, while also indicating a hitherto rarely considered, potential association between these vitamins and alkaline phosphatase (ALP) or thyroid stimulating hormone (TSH). Discussion This study demonstrates that machine-learning can efficiently predict some vitamin deficiencies in patients with active psychiatric symptoms, based on the largest cohort to date with intense psychiatric episode. The prediction method may expedite risk stratification and clinical decision-making regarding whether replacement therapy should be prescribed. Further research includes validating its external generalizability in other clinical situations and clarify whether interventions based on this method could improve patient care and cost-effectiveness.


INTRODUCTION
Vitamin B deficiency is common worldwide and may lead to psychiatric symptoms (1)(2)(3)(4). For example, meta-analyses have shown that patients with schizophrenia or first-episode psychosis have lower folate (vitamin B 9 ) levels than their healthy counterparts (4,5). Moreover, vitamin therapy can effectively alleviate symptoms in a subgroup of patients with schizophrenia (3,(6)(7)(8). However, the epidemiology of vitamin B deficiency in patients with active mental symptoms requiring immediate hospitalization has rarely been examined.
In a psychiatric emergency, psychiatrists should promptly distinguish treatable patients with altered mental status due to a physical disease from patients with an authentic mental disorder (International Statistical Classification of Diseases and Related Health Problems-10, ICD-10 code: F2-9). However, vitamin deficiency testing is very costly (around 60 dollars for each measurement of vitamin B 1 (vitB 1 ), vitamin B 12 (vitB 12 ), or folate in the U.S.; 15-25 dollars for each test in Japan) and usually requires at least two days. Therefore, an efficient, costeffective method of predicting vitamin B deficiency is needed.
Although several studies have applied machine-learning to the prediction of diagnosis or treatment outcomes (9)(10)(11), no study using machine-learning has focused on vitamin B deficiencies. We herein explore whether vitB 1 , vitB 12 , and folate deficiencies can be predicted using a machine-learning classifier from patient characteristics and routine blood test results obtained within one hour based on a large cohort of patients requiring urgent psychiatric hospitalization.

Medical Chart Review
We reviewed consecutive patients admitted to the Department of Neuropsychiatry at Tokyo Metropolitan Tama Medical Center, one of the biggest psychiatric tertiary-care centers in Japan, between September 2015 and August 2017 under the urgent involuntary hospitalization law, which requires the immediate psychiatric hospitalization of patients at imminent risk of seriously harming themselves or others. The necessity of hospitalization was judged by designated mental health specialists. There were no exclusion criteria. The patient characteristics, ICD-10 codes, and laboratory data were gathered retrospectively.

Classifiers and Statistics
We compared four types of standard machine-learning classifiers: k-nearest neighbors, logistic regression, support vector machine, and random forest. Each type of classifier was trained to predict the deficiency of each substance from age, sex, and 29 routine blood variables (described with values in the Results section). For developing the models, any missing values were replaced using the mean. The classifiers were trained using the dataset populated in the period from September 2015 to December 2016 (the "Training set"). First, except for logistic regression, we optimized the hyperparameters of the classifier by selecting the best combination of hyperparameters that maximized the "5-fold cross validation" accuracy, among many combinations within appropriate ranges. The cross-validation accuracy was computed as follows: in one session, the classifiers were trained using 80% of the training set and evaluated on the withheld 20% of the training set. This session was performed five times so that every data would be withheld once. The accuracies were finally averaged across sessions to yield the cross-validation accuracy. By incorporating this process, the classifiers were generalized to unseen data (Graphical method is shown in Figure 1).
Using the optimized hyperparameters, the classifiers were then validated using data collected from January 2017 through August 2017 (the "Validation set"). We report the classification performance on the validation set in the Results section unless otherwise stated.
We quantified the sensitivity, specificity, and accuracy (defined as the average of the sensitivity and the specificity on the optimal operating point) using receiver operating characteristic curves (ROCs). We also quantified the 95% confidence interval of the area under the ROCs (AUCs) and accuracy using 1000-times bootstrapping.
When investigating the Gini importance and the partial dependency (13), we retrained the classifiers using all datasets. All data analyses were performed using Python (2.7.10) with the Scikit-learn package (0. 19

Robustness Verification
We verified the robustness of the prediction performances by three independent approaches. First, we compared the following two prediction performances: random forest classifiers trained and validated using the dataset from the F2 population, and random forest classifiers trained and validated using the dataset from the non-F2 population.
Second, we compared the prediction performances of several random forest classifiers trained and validated using the dataset where different cut-off values were used to define the vitamin deficiency. We chose other two cut-off values for each vitamin based on previous reports (14)(15)(16), as well as pre-defined cut-off values (see also Medical Chart Review section).
Third, we trained and validated other random forest classifiers where the dataset was split in a different way. Here, the training set consisted of data between 31 January 2016 and August 2017 and the validation set consisted of data between September 2015 and 31 January 2016, so that the sample sizes of the training and validation sets were equal to those in the original split.

Subsampling Analysis
We also examined the relationship between the dataset size and the generalization performance (17). In this analysis, we trained the random forest classifiers using X% of the training set (X = 30, 35, 40, …, 95, and 100), and validated them using the validation set. The hyperparameters were identical to those used in the previous section. To remove sampling bias, this procedure was repeated 100 times for each value of X, where the training dataset was sampled  randomly for each repetition. This results in obtaining 100 AUC scores for each X and for each vitamin. We plotted the AUC scores (averaged across the 100 repetition) versus X for each vitamin, then the curve was fit with the following saturating function using Levenberg-Marquardt algorithm implemented as "curve_fit" function in the Scipy package (0.19.0).
where Y is the AUC score, and a and b are the parameters to fit. Note that Y !a+0.5 as X !∞ and Y = 0.5 as X = 0.

Ethical Considerations
Informed consent was obtained from participants using an optout form on the website. The study protocol was approved by the Research Ethics Committee, Tokyo Metropolitan Tama Medical Center (Approval number: 28-8). The study complied with the Declaration of Helsinki and the STROBE statement.

Prediction via Machine-Learning Using Routine Blood Test Results
Machine-learning classifiers were trained to predict the deficiency of each substance from patient characteristics and routine blood test results. The classifiers were trained using the dataset gathered in the period from September 2015 to December 2016 (the "Training set," n = 373), which was then validated from January 2017 through August 2017 (the "Validation set," n = 124). By splitting the whole dataset in this way, the ratio of the training and validation sample size was 3:1, a commonly used ratio in machine-learning analyses. AUCs for the validation set for each classifier are summarized in Table 4. Although the performance of the classifiers was similar except for the k-nearest neighbors, random forest yielded the highest AUC on average. Therefore, we focused on random forest in the following analysis.
The AUCs of the random forest classifiers were 0.716, 0.599, and 0.796, for vitB 1 , vitB 12 , and folate, respectively ( Figures 2D-F and Table 4). With some operative points on the ROC, the sensitivity, specificity, and accuracy for the validation set were calculated ( Table 4. See also Supplementary Table 4 for training  set and Supplementary Table 5 for different operating points). The 95% confidence interval (CI) of the AUC and accuracy was quantified using 1000-times bootstrapping. For random forest classifiers, the 95% CI of each value did not include 0.5, except for the AUC of vitB 12 . Figure 3 shows the Gini importance (A-C) and partial dependency plots (D-F) for the eight most important variables for each substance. The results provided further evidence of a relationship between the vitamin B levels and complete blood count while also indicating the hitherto rarely considered, potential association between these vitamins and alkaline phosphatase (ALP) or thyroid stimulating hormone (TSH).

Robustness Verification
We verified the robustness of the results by three independent means. First, we asked if the prediction performance was influenced by the ICD-10 categories. When the prediction performances were compared between the random forest classifiers trained using the dataset from the F2 population and the classifiers trained using the dataset from the other population, the AUC was not statistically different (DeLong's test), except in the case of vitB 1 (see Supplementary Table 6).
Second, we used different cut-off values to define the deficiency (14)(15)(16). Although the AUC for the validation set, shown in Supplementary Table 7, tended to be higher when strict cut-off values were used, the obtained AUCs were not statistically significant (p > 0.05, DeLong's test with Bonferroni correction).
Third, we investigated if the prediction performance was influenced by the way the dataset was split into the training and validation set. Here, we trained and evaluated random forest classifiers using a dataset split in a reversed way (see Methods section for details). The AUCs for the validation set were 0.771, 0.621, and 0.745 for vitB 1 , vitB 12 , and folate, respectively; none were statistically different from the AUC trained using the original setting (DeLong's test), further demonstrating the robustness of the performance.

Subsampling Analysis
To estimate the number needed to saturate the performance, we examined the relationship between the generalizability and the sample size (17). We randomly sampled X% of the training set, trained random forest classifiers using the dataset, and evaluated the generalization performance by AUCs using the validation set (X = 30, 35, 40, …, 95, and 100; see Methods for details). As shown in Figure 4, the relationships between AUC and the training size for vitB 1 and vitB 12 were almost saturated, whereas that for folate is not saturated. To quantitatively understand this, we fitted each curve using a saturating function formulated in equation (1)   for folate, a = 0.291 and b = 0.123. By using these parameter values and extrapolating the curve, we then computed how many additional samples are necessary to reach almost maximum performances. To reach 99% of the maximum performance [i.e., Y = (a + 0.5) × 0.99 in equation (1)], the training dataset to be collected was 92.5%, 143%, and 341% of the training size in this study for vitB 1 , vitB 12 , and folate, respectively. These quantitative analyses revealed that collecting further similar datasets up to 1,000 patients (e.g. four years × hospitals with similar scale as Tokyo Metropolitan Tama Medical Center) may increase and reproduce the generalizability for folate, while the effect of collecting further dataset is expected to be small for vitB 1 and vitB 12 .

Relevance of The Present Study
Based on the largest cohort to date of patients at imminent risk of seriously harming themselves or others, this study indicated that deficiency of certain vitamins can be predicted in an efficient manner via machine-learning using routine blood test results. The 29 routine blood variables are available at almost all hospitals/clinics and are necessary to rule out other comorbid physical problems. Given the large number of patients with vitamin B deficiencies, empirical therapy might be acceptable; however, risk stratification is preferred for personalized medicine and shared decision-making. The prediction method presented here may expedite clinical decision-making as to whether vitamins should be prescribed to a patient (Graphical Abstract).  Table 4 and Supplementary  Generalization performance of the classifiers was evaluated using AUC of the validation set for each type of classifiers. For random forest classifiers, sensitivity, specificity, and accuracy of the classification at the optimal operating points that maximized accuracy on the receiver operating characteristic curve of the validation set are also shown (see also Figures 2D-F). Accuracy was defined as the average of the sensitivity and specificity.
Square brackets indicate the 95% confidence interval. For further information, see Figure  2 and Supplementary Table 5. AUC, area under the receiver operating characteristic curve.
Remarkably, the AUC of folate deficiency was 0.796. The robustness of folate prediction was also suggested by various independent methods and statistics. Folate has a potential to maintain neuronal integrity and is one of the homocysteinereducing B-vitamins (5). Homocysteine may be linked to the etiology of schizophrenia (18), and vitamin B supplements have been reported to reduce psychiatric symptoms significantly in patients with schizophrenia (7). A recent meta-review has pointed out that the bioactivity of the supplement should be considered (e.g. methylfolate, which successfully crosses the blood-brain barrier, has been reported effective, whereas the effect of other forms of folate is equivocal) (19). As our study does not present longitudinal clinical courses, an intervention effect of folate supplementation to the cohort based on our method remains to be clarified.

Biological Mechanism Prediction
To connect with biological knowledge, we compared four models with high interpretability in this study. Using the random forest FIGURE 3 | Gini importance and partial dependence plots of vitamin B deficiencies. The Gini importance (A-C) and partial dependency plots of the probability of deficiency (D-F) are shown for the eight most important variables for vitamin B 1 , vitamin B 12 , and folate (vitamin B 9 ). Combined with these, this machine-learning classifier without hypothesis also provided further evidence of a relationship between vitamin B levels and the complete blood count while also indicating a potential association between these vitamins and alkaline phosphatase (ALP) or thyroid-stimulating hormone (TSH). Vit B 1 , vitamin B 1 ; Vit B 12 , vitamin B 12 ; Hb, hemoglobin; Hct, hematocrit; WBC, white blood cell count; CK, creatine kinase; RDW.CV, red blood cell distribution width-coefficient variation; Plt, platelet; ALT, alanine transaminase; Lym, lymphocyte fraction; Cre, creatinine; Neu, neutrocyte fraction; gGTP, g-glutamyltransferase; MCV, mean corpuscular volume; Glu, plasma glucose. classifiers, as shown in Figure 3, we identified several items related to complete blood count as top hits. Notably, our classifier was blind to any biological knowledge, including the well-established association between anemia and vitamin B deficiency, including folate (20). The results provide further evidence of a relationship between vitamin B levels and the complete blood count and support the use of machine-learning to investigate novel, underlying biological mechanisms (21).
ALP and its metabolites indicate the vitamin B 6 status (22); low vitB 12 is potentially associated with low ALP (23). More generally, ALP may have a close and complicated relationship with the overall vitamin B group. Autoimmune disorders, especially thyroid disease, are commonly associated with pernicious anaemia (24), but there has been no established hypothesis regarding the causal relationships between thyroid disease and vitamin B deficiencies. The potential association between the levels of these vitamins and ALP or TSH awaits further study, both via investigations of populations and basic research (25).

Limitations
This study is subject to several limitations. First, the findings of this single-center retrospective study may have limited external generalizability, though internal generalizability was considered to the maximum extent. Second, the patients' basic characteristics and long-term prognosis were not fully investigated due to administrative restrictions. Though there is similar involuntary treatment/admission in psychiatry worldwide, there is a gap between legislation and practice (26). Therefore, the extent to which this method can expedite clinical decision-making is unclear.
Further, we did not investigate the relationship between serological values and the need for intervention. The lack of data for vitamin B deficiency in the Japanese general population hampered the comparison between the experimental cohort and their counterparts who lacked psychiatric symptoms. Establishing appropriate reference values and an assessment method requires further investigation. Finally, we did not assess the predictive value of other nutritional impairments, including vitamin B 6 and homocysteine deficiency, which were previously shown to have a close link with psychiatric symptoms (3,5); however, our study provides fundamental data on nutritional impairment based on the largest cohort of patients with intense psychiatric episode ever assembled for this purpose and presents a potential framework for predicting nutritional impairment using machine-learning.

Conclusion
The present report is, to the best of our knowledge, the first to demonstrate that machine-learning can efficiently predict nutritional impairment. This study also provides a possible application of machine-learning to investigate novel, underlying biological mechanisms. Further research is needed to validate the external generalizability of the findings in other clinical situations and clarify whether interventions based on this method can improve patient care and cost-effectiveness.

DATA AVAILABILITY STATEMENT
The source code is available on https://github.com/ukky17/ vitaminPrediction. The datasets utilized in the current study are available from the corresponding author upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Research Ethics Committee, Tokyo Metropolitan Tama Medical Center. Informed consent was obtained from participants using an optout form on the website.

AUTHOR CONTRIBUTIONS
HTam has full access to all data and takes responsibility for the integrity of the data. HTam, JU, KN, and NY conceived the study. HTam, YH, and HTan collected the data. JU performed the statistical analyses. HTam and JU drafted the first version of the manuscript. All authors critically revised the manuscript for intellectual content and approved the final version.