Application of machine learning in predicting aggressive behaviors from hospitalized patients with schizophrenia

Objective To establish a predictive model of aggressive behaviors from hospitalized patients with schizophrenia through applying multiple machine learning algorithms, to provide a reference for accurately predicting and preventing of the occurrence of aggressive behaviors. Methods The cluster sampling method was used to select patients with schizophrenia who were hospitalized in our hospital from July 2019 to August 2021 as the survey objects, and they were divided into an aggressive behavior group (611 cases) and a non-aggressive behavior group (1,426 cases) according to whether they experienced obvious aggressive behaviors during hospitalization. Self-administered General Condition Questionnaire, Insight and Treatment Attitude Questionnaire (ITAQ), Family APGAR (Adaptation, Partnership, Growth, Affection, Resolve) Questionnaire (APGAR), Social Support Rating Scale Questionnaire (SSRS) and Family Burden Scale of Disease Questionnaire (FBS) were used for the survey. The Multi-layer Perceptron, Lasso, Support Vector Machine and Random Forest algorithms were used to build a predictive model for the occurrence of aggressive behaviors from hospitalized patients with schizophrenia and to evaluate its predictive effect. Nomogram was used to build a clinical application tool. Results The area under the receiver operating characteristic curve (AUC) values of the Multi-Layer Perceptron, Lasso, Support Vector Machine, and Random Forest were 0.904 (95% CI: 0.877–0.926), 0.901 (95% CI: 0.874–0.923), 0.902 (95% CI: 0.876–0.924), and 0.955 (95% CI: 0.935–0.970), where the AUCs of the Random Forest and the remaining three models were statistically different (p < 0.0001), and the remaining three models were not statistically different in pair comparisons (p > 0.5). Conclusion Machine learning models can fairly predict aggressive behaviors in hospitalized patients with schizophrenia, among which Random Forest has the best predictive effect and has some value in clinical application.


Introduction
Schizophrenia, as a group of severe psychiatric disorders, has an unknown etiology and is mostly characterized by multiple impairments in perception, emotion, thinking, and behavior, as well as uncoordinated mental activity. Previous studies have shown that the most common type of psychiatric disorder in which aggressive behavior occurs or in which delinquency occurs is schizophrenia (1,2). The prevalence of threatening and aggressive behavior is common in hospitalized schizophrenia patients, ranging from 15.3 to 53.2% (3). Aggressive behaviors refers to verbal or physical behavior with hostile intent, destroying objects or attacking others (4). Meta-analysis showed that 15.3-53.2% of hospitalized patients with schizophrenia in China had experienced aggressive behaviors during hospitalization, and the incidence of aggressive behaviors after the combination was 35.14% (3). The most frequent targets of patients' aggressive behaviors were psychiatric medical staff, and a cross-sectional survey data from the MatchRN Psychiatry study found that almost 30% of nurses had been subjected to a serious assault in their professional lifetimes (5). The negative consequences of aggressive behaviors directly affect the safety and physical/mental health of patients themselves as well as others. And it increases the use of mandatory medical measures such as restraint and isolation, which raised the medical and family economic burden. Therefore, accurate assessment, risk warning and effective intervention of aggressive behaviors from hospitalized patients with schizophrenia have become the focus of psychiatric clinical work. Machine learning, which belongs to the branch of artificial intelligence, has been widely used in the medical field by constructing models through self-learning in big data and then making predictions on new data sets. Broadly speaking, machine learning is a computational technique that trains, learns, and gives solutions from input data sets (6). In psychiatric field, machine learning has been used by scholars to predict the response of patients with schizophrenia to repeated transcranial magnetic stimulation as well as suicide attempts of patients with schizophrenia, which works well in prediction (7,8). Then a research from Gallos et al. (9) showed the ISOMAP and machine learning algorithms for the construction of embedded functional connectivity networks of anatomically separated brain regions from resting state fMRI data of patients with Schizophrenia, which also utilizes Random forest and Lasso for both diagnosing and finding biomarkers for the disease. Similarly, other studies from Gallos et al. (10,11) tried to diagnose schizophrenia and find biomarkers for schizophrenia and tried to find ways to monitor treatments for Schizophrenia. A study that used machine learning algorithms to predict and find the influential factors of violence in male schizophrenia patients by Yu et al. (12). The aim of this study was to apply Multi-Layer Perceptron (MLP) (13), Lasso regression (14), Support Vector Machine (SVM) (15) and Random Forest (RF) (16) algorithms to predict aggressive behaviors of hospitalized patients with schizophrenia, to explore the application value of machine learning models in predicting aggressive behaviors of patients with schizophrenia.

Study object
Patients with schizophrenia who were hospitalized in the Second Affiliated Hospital of Xinxiang Medical University from July 2019 to August 2021 were selected using the cluster sampling method. Inclusion criteria: (1) Meet the diagnostic criteria for schizophrenia in the International Statistical Classification of Diseases and Related Health Problem, Tenth Revision (ICD-10) (17); (2) no gender restriction, age ≥14 years old; (3) Education level of primary school or above, normal hearing and vision, able to understand and cooperate with the completion of the scale assessment; (4) previous outpatient or inpatient diagnosis for schizophrenia is needed, having taken antipsychotic drugs for 6 months or more. Exclusion criteria: (1) those with intellectual disability or combined organic brain disease; (2) those with severe physical illness or adverse drug reactions; (3) those with severe mental decline or excitement and agitation; (4) those with visual or auditory perception impairment; (5) pregnant or lactating female patients. The study was approved by the Ethics Committee of the Second Affiliated Hospital of Xinxiang Medical University, and the purpose and significance of the study were explained to the study subjects and their guardians, and written informed consent was obtained from the patients and their guardians.

Survey content
(1) General condition questionnaire: the questionnaire was developed by the subject members themselves based on previous studies (18)(19)(20) as well as the medical staff 's own experience, and was refined after a pre-survey to collect baseline information from the research objects. The questionnaire included: (1) basic information: age, gender, marital status, education, residence, occupation, caregiver, and family income; (2) disease information: duration of disease, times of hospitalizations, family history of schizophrenia, past attack history, and management style during hospitalization; (3) pre-admission status: medication adherence and subsequent visits. (2) Insight and Treatment Attitude Questionnaire (ITAQ) (21,22): this scale was translated by Zhang Jing-hang etc. based on the 1989 version of McEvoy etc. The scale consists of 11 items, including knowledge of the disease and attitude toward treatment, and each item is rated on a 3-level scale from 0 to 2, with a total score of 0-22, in which higher total scores indicating better knowledge of the disease and better attitude toward treatment. The ITAQ had a retest reliability of 0.93 and a consistency reliability of 0.80. The stability of the ITAQ was good, and it was significantly correlated with the Positive Symptom Scale, the Negative Symptom Scale, and the Brief Psychiatric Scale, which showed that the ITAQ could accurately reflect patients' conditions and had good validity in assessing patients' insight. (3) The Family APGAR Questionnaire (23,24): this scale was developed by Smikstein in 1978 to evaluate subjects' subjective satisfaction with family functions. The scale contains five factors: Adaptation, Partnership, Growth, Affection, Resolve. The scale is rated on a 3-level Likert scale, with scores of 0 to 2 on terms of "rarely, sometimes, and often. " The total score is 0-10, and the higher the score is, the better the family function is; there are 3 levels, 7-10 for good family function, 4-6 for moderate family function, and 0-3 for poor family function. The scale has a retest reliability of 0.80-0.83. (4) Social Support Rating Scale (SSRS) (25,26): It was developed in 1986 by Xiao Shui Shui etc. on the basis of reference to relevant international data, and the evaluation index includes 3 dimensions of subjective support, objective support and support utilization, with a total of 10 items. The scale can be summarized into Frontiers in Psychiatry 03 frontiersin.org 4 statistical indicators, with objective and subjective support factor scores ranging from 2 to 22 and 8 to 32, respectively; support utilization factor scores ranging from 3 to 12; and total scores ranging from 13 to 66, with higher scores resulting in more social support. The α coefficient of the total scale was 0.69, subjective support was 0.849, objective support was 0.825; support utilization was 0.833 which indicating a high reliability coefficient of internal consistency of each subscale. (5) Family Burden Scale of Disease (FBS) (27): a semidefinite scale developed by Pai and Kapur in 1981 for families of patients with mental disease to evaluate the burden of the disease on the family and its members. The scale includes 6 dimensions: family daily activities, financial burden, family recreational activities, physical health of family members, family relationships, and psychological health of family members, with a total of 24 items. Each item was rated on a 3-level scale from 0 to 2, with severe burden rated 2, moderate burden rated 1, and no burden rated 0. The higher the score, the heavier the burden on the family. The Cronbach's α coefficient of the scale was 0.87-0.99. (6) Aggressive behaviors: The Modified Overt Aggression Scale (MOAS) (28), it was applied to evaluate patients' aggressive behaviors before they were discharged from the hospital, and a weighted total score of 4 or more was used as the inclusion criterion for the "group with significant aggressive behavior. " The MOAS sort out four categories of aggressive behaviors, all of which were rated on 5-levels scale from 0 to 4, and weighted scores were set for different aggressive behaviors. The scorer reliability was tested, and the consistency among multiple raters was good, the Intraclass Correlation Coefficient (ICC) of the scale was 0.84 (p < 0.01). Taking "whether obvious aggressive behaviors occurs" as the dependent variable, and "factors influencing aggressive behaviors" as the independent variable, all the types and codes of all variables are shown in Table 1.

Survey method
All surveys were completed by 6 psychiatric attending physicians and 6 psychiatric nurse practitioners from our hospital who were well trained for this subject in purpose of having a consistent data criteria.
(1) The survey of baseline information was completed by psychiatric nurses within 3 days after the patients were enrolled. The out-ofhospital medication adherence can be defined as the compliance level to which a patient's medication-taking matches the prescriber. Good adherence refers to the patient's long-term adherence to medication in full compliance with the doctor's prescription; moderate adherence refers to the fact that the patient cannot take the medicine exactly as prescribed by the doctor, including taking less or more, not on time, or missing sometimes; poor adherence refers to the patient not taking medication as prescribed frequently, often not taking medication, or stopping taking medication. The subsequent vists was evaluated by: the patient's family was asked about the subsequent vists in the last 6 months, the specific evaluation method is shown in Table 1. (2) The ITAQ, APGAR, SSRS and FBS surveys were conducted by the psychiatrist within 3 days after the patient's enrollment. The survey will use unified guidelines. The assessor can read each question to the patients and guardians. Where they do not understand can be explained in details. (3) After the questionnaire was collected, invalid questionnaires with missing items or inconsistencies were excluded. (4) The MOAS scale was used by the psychiatric nurses to investigate whether the patients had any aggressive behaviors during the hospitalization before discharge from the hospital.

Machine learning methods
In this study, several machine learning algorithms are selected for modeling, and different models have their own characteristics. The Multi-Layer Perceptron consists of an input layer, an output layer, and a "hidden" layer between them, and the connections between neurons in each layer are given weights, which are continuously adjusted during the generation process to train the learning algorithm and minimize the prediction error, due to the large number of parameters. The algorithm is prone to overfitting and gradient exploding (29). The Lasso adds L2 regularization term based on the general linear model to compress the coefficients with small absolute values to zero, so as to achieve the purpose of having variables selected and parameters estimated at the same time, which overcomes the selection methods limitations of stepwise regression variables while retaining the excellent properties of subset selection and ridge regression (30). The Random Forest is an integrated algorithm that consisted of multiple decision trees (31) with a very high accuracy, a decent interpretability of the results and the ability to evaluate the importance of each feature in the classification. Gini importance algorithm, which measures the importance of each feature by calculating the mean of impurity decrease across all trees in the forest, was utilized for rank the relative importance of features. Support Vector Machine algorithms are theoretically based on nonlinear mapping, where features can be mapped to a high-dimensional space by kernel functions and then find a "hyperplane" that can be used for classification (8). Support Vector Machines can avoid "Curse of Dimensionality" to a certain extent. And robustness and generalization is good in performance, but it's difficult for SVM to be used for training large-scale samples and it is sensitive to parameters and kernel functions. In summary, each of the four algorithms has its own characteristics and advantages, and the Random Forest achieved the best fitting in this study.

Model building and validation
In this subject, the Python libraries "numpy, " "pandas" and "scikitlearn" were used for data processing as well as machine learning model building and validation. The samples were randomly divided into training set (70%) and testing set (30%). The training set was used for model training and hyperparameter optimization. The Bayesian optimization (32) was used in the training phase to obtain the best  The other hyperparameters adopted the default settings in the "sklearn" package of python.

Models Hyperparameters
Frontiers in Psychiatry 04 frontiersin.org hyperparameters for the model, while 4-fold cross-validation was used to ensure the stability of the selected hyperparameters. After the optimal hyperparameters were obtained, 10 times 4-fold cross validation conducted on the training set for inner validation, then the model was retrained on all training set data, and then the performance of the model is finally evaluated using the testing set. The evaluation indicators include accuracy, sensitivity, specificity, and area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The optimal hyperparameters for each machine learning model are shown in Table 1, and the ROC curves of the models were statistically compared using the Delong test (33). The flowchart that sketches the methodology of this study is shown in Figure 1.

Statistical analysis
Data entry had been double-checked with Epidata 3.0 by different researchers. SPSS23.0 was used for data statistical description and analysis. Count data were described by the number of cases and percentages, and the χ 2 test was used for comparison; measurement data conforming to normal distribution were expressed by x ± s and compared by t-test; rank data were tested by rank sum test, and p < 0.05 was considered statistically significant difference.

Basic information
A total of 2,184 patients with schizophrenia were enrolled in this study, 2,184 questionnaires were distributed, 2064 valid questionnaires were collected, and the effective collection efficiency of questionnaires was 94.51%. During the hospitalization period, 27 cases withdrew from the study for reasons such as automatic discharge from hospital or developed combined somatic diseases. A total of 2037 patients were included in the analysis, of which 611 cases (611/2037, 30.00%) showed obvious aggressive behaviors and 1,426 cases (1426/2037, 70.00%) did not show obvious aggressive behaviors. The basic profiles of the patients in both groups are shown in Table 2.
Performance of machine learning models All 19 variables are included in the four machine learning algorithms for model fitting, and the included variables are shown in Table 3. The Random Forest was found to have the best prediction in the testing set, with a statistically different AUC from the remaining three models (p < 0.0001), and the remaining three models were not statistically different in pair comparisons (p > 0.5). The AUC predicted by RF was 0.955; the AUC predicted by SVM was 0.902; the AUC predicted by MLP was 0.904; and the AUC predicted by Lasso was 0.901, as shown Table 4. Figure 2 shows the ROC curves and AUC values of 4 different machine learning models on the testing set. The inner validation result shown in Table 5. To explore the effect of data imbalance on the model fitting effect, we randomly selected 611 cases of aggressive patients with the same number of non-aggressive patients, created a new dataset (balanced dataset), and repeated the above modeling evaluation process on the new dataset. The results showed that the performance of the models fitted to the balanced and unbalanced datasets was comparable, and the models fitted on the balanced dataset were slightly inferior to those fitted on the unbalanced dataset ( Table 5).

Rank of feature importance
Based on the Random Forest model, this study ranked the importance of the features in predictive value. The results are FIGURE 1 The flowchart of the methodology.    Frontiers in Psychiatry 07 frontiersin.org shown in Figure 3. Further, we included the ranked features in order of feature importance for modeling (Figure 4), and the results showed that the model performance was maintained at a stable plateau to some extent after the inclusion of the top 8 features (AUC = 0.9397). In addition, the model performs best when all features were included (AUC = 0.9491). The top 8 features were APGAR, ITAQ, Duration, History of Attacks, SSRS, Medication Adherence, Age, FBS.

Prediction tools of constructing a nomogram
To facilitate clinical application, a nomogram was drawn based on the top eight significant variables obtained from the Random Forest model, as shown Figure 5. According to the scaleplate above the nomogram, the individual score of each risk factor is obtained, and all risk factors are added to obtain the total scores, and the total scores shows the probability of aggressive behaviors during hospitalization of each patient.

Discussion
Aggressive behaviors in patients with schizophrenia can be very harmful to their family members, the patients themselves, and health care professionals. Currently, the prediction of aggressive behaviors of patients with schizophrenia in China mainly uses MOAS etc., and many other existing structured clinical risk assessment tools, which are very time-consuming (28). Current research on aggressive behaviors of patients with schizophrenia has mainly focused on analyzing its influencing factors using logistic regression. Wang et al. in Toronto reported the outcomes of a binary logistic regression model with six machine learning algorithms used in predicting violence status in 275 patients with schizophrenia, using various demographic, clinical, and sociocultural predictor variables. The study showed that the random forest model performed marginally better than other algorithms (34). Yu et al. in Hefei used eight machine learning algorithms to predict violent behavior in 397 male patients with schizophrenia. The Neural Net had better prediction ability than that of other algorithms (12).     AUC values of random forest model versus number of top-ranked features.

FIGURE 5
Nomogram based on the top 8 significant features (variables) filtered by Random Forest.
patients in Zurich with a conclusion of Boosted Classification Trees as best suited (37). Guo et al. In Shenzhen examined 74 male participants with a hybrid machine learning model (LASSO regression and SVM), with an AUC of 0.95 (38). Please check the detailed performance of these studies in Table 6. These studies suggest that with further work, classification algorithms may have the ability to supplement diagnostic decisions to improve the treatment and well-being of patients with schizophrenia. In this study, multiple machine learning classification algorithms were used to predict aggressive behaviors of patients with schizophrenia based on multidimensional indicators such as demographic, clinical and social etc. The Random Forest algorithm showed value in prediction, and this study performed feature importance ranking to increase the interpretability of the model, and plotted a nomogram of tools for clinical application, highlighting the value for clinical application of this study. The top 8 features in Random Forest regarding prediction of aggressive behaviors from hospitalized patients with schizophrenia were: APGAR, ITAQ, Duration, History of Attacks, SSRS, Medication Adherence, Age, FBS. Previous studies have shown that family care of patients with schizophrenia is positively correlated with their mental health (39); some cognitive impairments appear to have their place in the genesis, progression and maintenance of violent acts of individuals with schizophrenia (40). cognitive impairment is more severe in patients with schizophrenia with a longer course of the illness (41, 42); a history of previous aggressive behaviors or being a victim of violent attacks can be a risk factor for increases of aggressive behaviors (43); poor social support (such as poor communication with family members or difficulties in getting along), treatment and medication poor adherence, and a heavy family burden that leaves patients with no family care can increase the risk of aggressive or even violent behaviors from patients (44,FIGURE 3 Rank of feature importance obtained by random forest. Frontiers in Psychiatry 09 frontiersin.org 45). Combined with the results of this study, it suggests that not only should patients with schizophrenia be treated in a standardized manner in the clinic, but also their family members should be included in the treatment activities. Family education and training can be carried out to enhance the family members' acceptance to patients, to eliminate the negative attitudes such as discrimination against patients within the family, to help patients to recover faster in a good family atmosphere, to improve their insight and attitude towards treatment.
Nomogram is a statistical model for individualized predictive analysis of clinical events, which can provide better individualized predictive risk assessment in an intuitive and visual way (46,47). In this study, a visual nomogram model was established based on the integration of 8 important variables screened out by the Random Forest. The calibration curve and ROC curve analysis showed that this model had good predictive calibration and discrimination. This model can be used by health care professionals to predict the probability of aggressive behaviors in hospitalized patients with schizophrenia based on the summation of the scores of each risk factor, to identify high-risk groups, to provide early warning of risks, to develop effective treatment and intervention measures, to prevent or mitigate the adverse consequences of aggressive behaviors, to improve safety management in psychiatric departments, and to create a healthy environment for patients and health care professionals.
This study constructed a predictive model of aggressive behaviors by investigating the baseline data, disease conditions, and scale assessments of 2,037 patients with schizophrenia, which is superior to traditional scale predictive methods. It can accurately reflect the influential factors leading to the aggressive behaviors of patients with schizophrenia, providing reliable data support for psychiatric clinical prediction and prevention of aggressive behaviors which can reduce the losses. The deficiencies of this study: first, this study aimed to build a prediction model, data collection had to be completed at the initial stage of patient admission, so the impact of the patient's disease stage cannot be studied; second, this study only analyzed general demographic information and disease-related factors of patients with schizophrenia, but did not collect predisposing factors and biological indicators for the occurrence of aggressive behaviors. More variables need to be further explored. Subsequent studies can further improve the above deficiencies to systematically study the biological, psychological, and social factors of patients with schizophrenia in purpose of predicting aggressive behaviors more accurately and ensuring the safety in psychiatric clinical practices.
In summary, machine learning algorithms can be used for risk prediction of aggressive behaviors from patients with schizophrenia. This study has some value in clinical application and it is conducive to the clinical development of differentiated management and precise nursing against aggressive behaviors from patients with schizophrenia of different types.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.

Ethics statement
The studies involving human participants were reviewed and approved by Ethics Committee of the Second Affiliated Hospital of Xinxiang Medical University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

Author contributions
CW contributed to conception and design of the study. MG, FY, and ZG organized the database. YH performed the statistical analysis. NC and MG wrote the first draft of the manuscript. JM, YZ, ZD, and KN wrote sections of the manuscript. JM edited the manuscript. CW contributed to project administration. All authors contributed to the article and approved the submitted version. [LHGJ20220637]", and "Construction of "Internet +" continuous nursing service model and its application in stable stage patients with severe mental disorders in community [RKX202202038].