Machine Learning for prediction of violent behaviors in schizophrenia spectrum disorders: a systematic review

Background Schizophrenia spectrum disorders (SSD) can be associated with an increased risk of violent behavior (VB), which can harm patients, others, and properties. Prediction of VB could help reduce the SSD burden on patients and healthcare systems. Some recent studies have used machine learning (ML) algorithms to identify SSD patients at risk of VB. In this article, we aimed to review studies that used ML to predict VB in SSD patients and discuss the most successful ML methods and predictors of VB. Methods We performed a systematic search in PubMed, Web of Sciences, Embase, and PsycINFO on September 30, 2023, to identify studies on the application of ML in predicting VB in SSD patients. Results We included 18 studies with data from 11,733 patients diagnosed with SSD. Different ML models demonstrated mixed performance with an area under the receiver operating characteristic curve of 0.56-0.95 and an accuracy of 50.27-90.67% in predicting violence among SSD patients. Our comparative analysis demonstrated a superior performance for the gradient boosting model, compared to other ML models in predicting VB among SSD patients. Various sociodemographic, clinical, metabolic, and neuroimaging features were associated with VB, with age and olanzapine equivalent dose at the time of discharge being the most frequently identified factors. Conclusion ML models demonstrated varied VB prediction performance in SSD patients, with gradient boosting outperforming. Further research is warranted for clinical applications of ML methods in this field.


Introduction
Schizophrenia disorders are characterized by delusions, hallucinations, disordered thinking, disorganized behavior, and blunted or inappropriate affects (1,2).The disorders profoundly impact an individual's quality of life and can also pose a risk to others, especially when they lead to violent behaviors (VB) (3).People with schizophrenia are frequently stigmatized as having a higher potential for violence, resulting in discrimination (4).Moreover, recent research has shown that schizophrenia spectrum disorders (SSD)including schizophrenia, schizoaffective disorder, and other delusional disordershave been linked with an increased risk of VB in various studies conducted worldwide (5)(6)(7)(8).
The definition of VB is diverse, but it generally encompasses any manifestation of verbal or physical aggression directed at objects, others, or oneself (9,10).The impact of VB is widespread, affecting not only the patients themselves, who may lose property, relationships, and well-being, but also their caregivers, such as family, friends, or healthcare workers, who can be traumatized by the experience (11,12).Additionally, VB can increase the burden on the healthcare system for patients with SSD (13).A recent systematic review and meta-analysis reported a prevalence of 17.19 -23.83% for different types of VB other than homicide among SSD patients (5).Another systematic review and meta-analysis, which pooled data from 15 countries, reported an odds ratio of 4.5 for interpersonal VB among SSD individuals compared to a general population group without these disorders (7).
Given the significant impact that VB can have on patients and those in their environment, it is critical to accurately predict the risk of VB to help prevent these behaviors.To date, many studies have investigated the risk factors for VB in SSD patients, including sociodemographic factors, disease characteristics, and previous patients' medical history (14-16).However, most of these studies could not predict the risk of VB accurately, due to the complex and multifactorial nature of violence occurrence (17).
Machine learning (ML) is a subset of artificial intelligence that uses algorithms to learn from data, identify patterns, and make predictions (18,19).By analyzing large amounts of data, ML algorithms can identify complex relationships and hidden links behind phenomena that are not obvious to human observers (20).The key aspect of ML is its capability to build predictive models, demonstrated by its ability to anticipate clinical outcomes such as suicidal ideation, impulsivity, and VB (19,21,22).This attribute renders ML a promising instrument for unraveling the intricate interplay between schizophrenia and VB, thereby aiding healthcare providers in the early identification of individuals susceptible to VB (23,24).This, in turn, holds the potential to optimize resource allocation, diminish lay times, and fortify the safety of both staff and patients (25).Ultimately, the trajectory of ML in healthcare portends the evolution of medical prediction tools, envisaging their integration into routine clinical practice to proactively avert instances of VB and alleviate the burden of schizophrenia within this context (26).
This systematic review aims to investigate the potential of ML in predicting VB in patients with SSD, which we believe will offer a better understanding of the potential of ML in this clinical context and will be of interest to researchers and healthcare providers seeking to use ML to identify patients at risk of VB.Our main objectives are: 1) to discuss the most robust algorithms used for the prediction of VB; 2) to assess the general accuracy that has been achieved in predicting VB using ML; and 3) to review the effective factors that have enhanced ML's ability to predict VB.

Data synthesis
To bypass the limitations of meta-analyzing heterogeneous datasets, one author (MT) implemented a novel comparative approach, ranking each ML model's performance within individual studies and then averaging ranks across studies to identify the best overall performing ML model.

Risk of bias assessment
To assess the risk of bias (ROB), we employed the Prediction Model Risk of Bias Assessment Tool (PROBAST) (33).It is a tool for assessing ROB and the applicability of diagnostic and prognostic prediction model studies.PROBAST evaluates 4 domains of participants, predictors, outcome, and analysis in the study by 20 signaling questions.signaling questions of the PROBAST checklist and its guidance notes for rating ROB and applicability are fully provided in PROBAST checklist section of the Supplementary Material.These questions facilitate structured judgment of ROB in the studies of predictive models.We used the explanation and elaboration document that describes the rationale for including each domain and signaling question and guides researchers to use them to assess the ROB and applicability concerns.Also, to assess the ROB in the studies that employed more than one ML model, we selected the ML model with the best performance (best AUROC or accuracy).

Study selection
The search strategy employed in this systematic review yielded 3941 articles.Following the removal of duplicates, 2142 articles remained for further assessment.After assessing the abstracts, 250 articles were deemed suitable for full-text screening.A total of 18 articles satisfied the eligibility criteria and were included in the final analysis (Figure 1).Table 1 shows the characteristics and extracted data of the included articles.

Study characteristics 3.2.1 General features
The 18 included studies were conducted in Switzerland (n=8), China (n=8), and Canada (n=2).A total of 11,733 patients diagnosed with SSD were systematically reviewed in the present study, with diagnostic criteria including Diagnostic and Statistical Manual of Mental Disorders (DSM)-III, IV, and V, International Classification of Diseases (ICD)-9 and 10.Of the patients, 7,330 (62.47%) were male, and 4,403 (37.53%) were female.Three studies included exclusively male participants (38, 43,44).Except for one study that recruited outpatients (34), all other studies recruited participants from inpatient settings.Among these studies, four employed ML models to predict VB during the current admission (35,41,47,51).Additionally, nine studies categorized patients based on the occurrence of VB prior to their current admission (38-40, 43, 44, 46, 48-50), while another four classified patients into violent and non-violent groups by retrospectively reviewing their medical records since their disease onset (36,37,42,45).Moreover, eight studies were part of a larger project investigating the relationship between SSD and offending and used the same dataset of offender patients as their sample population (39, 41, 42, 45, 46, 48-50).
Moreover, two other studies examined the role of biochemical markers in indicating VB.Chen et al. (2015) examined the relationship between the violence trajectories, baseline clinical features, and lipid levels to develop a model to predict more violent trajectories (35), while Chen et al. (2020) tried to identify the metabolic characteristic of violent schizophrenia patients, including amino acids, lipids, and carbohydrates metabolism, by performing untargeted metabolomics and analyzing their plasma metabolites (36).

Output measures
The definition of VB varied significantly across studies due to the use of different criteria, scales, or aims.While some studies defined verbal aggression as VB, others only included physical aggression, and some differentiated offenses based on their severity.Four studies utilized the Modified Overt Aggression Scale (MOAS) (60) criteria, but with different thresholds (37,38,44,47) (41).On the other hand, six studies used a shared database to distinguish between violent and non-violent offenses (39,42,46,(48)(49)(50).In a seventh study, they attempted to predict the risk of homicide among other offenses (45).

Machine learning 3.3.1 Overview of algorithms
None of the 18 studies utilized unsupervised learning (clustering), which is consistent with the nature of the subjectsince the classes and the target of classification is given (64).Instead, all of them used supervised learning (classification or regression), with three studies (43,44,47) incorporating deep learning through the neural network (NNET) or multi-layer perceptron (MLP) model.Among the top classification methods of supervised learning, support vector machine (SVM) was utilized in fifteen studies, decision trees (including random forests (RF) in fifteen, and k-nearest neighbor (KNN) in eleven.For the top regression methods of supervised learning, logistic regression (LR) (including stepwise LR) was utilized by twelve studies, while least absolute shrinkage and selection operator (LASSO) was used by five.While thirteen studies compared different ML models' functions in violence prediction, others focused on developing a single prediction model (34-36, 38, 40).See Supplementary Material for detailed information regarding the model development and validation across the reviewed studies.

Model development
In most of the studies, some details were unclear about model development, with few providing information about hyperparameter tuning, an essential part of model development.Hyperparameters are parameters set before the training process begins and affect how the model learns from and generalizes the data (65).Tuning hyperparameters can significantly impact model performance and determine the complexity/flexibility of the model (65).Among the eighteen studies, four provided some explanation about the hyperparameter tuning (34,35,38,47), two used default settings without optimization (41,45), and the other twelve studies did not mention anything about hyperparameter optimization.
One study did not develop a prediction model but sought to find the best predictors of violence in SSD by using SVM and LR separately (36).Then they identified overlapping best predictors among metabolic biomarkers.By using two different models separately, they aimed to minimize overfittinga common bias where models fit too closely to the training data, producing good predictions for data points in the training set but do not generalize well to new data, performing poorly on new samples (65)as it is unlikely for two different algorithms to overfit the same way.
The remaining studies developed and assessed models for violence prediction in SSD.They employed feature selection or cross-validation to overcome overfitting bias and achieve more accurate model development.Seven studies employed data-driven feature selection by ML before model training to control overfitting: one utilized LASSO (38), three used RF (39, 41, 45), one applied boosted tree (42), one utilized both LASSO and LR (43), and one selected features after calculation of variable importance for each employed model separately (51).Sixteen studies used crossvalidation, with two using 10-fold cross-validation (43,44), one using 7-fold (36), nine using 5-fold (37,39,41,42,45,46, 48-50), one using 4-fold (47), and one using 3-fold (34).Two studies did not use cross-validation (35,40).Furthermore, only sixteen studies acknowledged the implementation of imputation methods on their respective training set data (39, 41,45,46,[49][50][51].Imputation methods refers to techniques for estimating or imputing missing values within datasets to enhance overall completeness and analytical suitability (66).Notably, 5 studies opted for a common practice wherein missing continuous values were imputed with the mean observed values pertaining to the respective variable, while categorical variables underwent replacement with the mode of observed values (39, 41,45,46,49,50).However, one study imputed missing continuous variables with either the observed mean or median values, concurrently addressing missing categorical variables based on the mode of observed values (51).
The choice of ML models is often influenced by the type of data being used.According to a survey (67), deep learning models, such as NNET and MLP, are commonly employed for interpreting imagery data.Among the studies we reviewed, two specifically utilized brain imaging data to train ML models: In one study, LASSO was employed for image interpretation, while SVM was used for integrating image and clinical data and making final predictions (38).In the other study, seven models, including NNET, were compared to assess their performance (44).

Model validation
Regarding model validation and generalization assessment, six studies reported results on the training set (34)(35)(36)(37)(38)42), while the rest of the studies performed internal validation by evaluating unseen portions of their training set.However, none of the studies conducted external validation using an independent and unseen set of data.This further implies that the prediction accuracy reported in these studies was based on a retrospective estimate rather than a prospective prediction and none of the studies tested their algorithms' accuracy on future cases.

Models results
Primary outcome measures for evaluating model performance included area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), with AUROC and accuracy being the most frequently used performance metrics.Regarding each metric, the ranges, the proportion of studies reaching values ≥75%, and the best-performing study were as the following: AUROC (0.

Models comparison
Running a meta-analysis on diverse studies with varying datasets, features, and variable distributions was impossible; Therefore, we adopted a particular approach to overcome the challenge of integrating and comparing the results of these studies.We specifically targeted studies that were designed to compare different models, as they offered valuable insights for our analysis.By extracting the rankings of different models, we could assess their relative performance, independent of the specific magnitude of each function indicator.This allowed us to overcome the limitations associated with diverse study designs and datasets, enabling a more meaningful comparison (Table 2).
As mentioned earlier, thirteen studies were designed to compare different models (37,39,(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51).However, two of these studies utilized imaging data (38,44), which differed from the data used in the other studies.Since each ML model typically performs well with specific types of data (65), combining the results of these two studies with the others was not appropriate.Therefore, we excluded these studies from the analysis to maintain standardization across the dataset, which left us with eleven studies.
The performance rank of each model across the different studies was aggregated to generate a final rank.This approach allowed us to understand the average success rate of each model.To enhance the interpretability of the results, we took two steps.Firstly, we excluded models used in less than half of the eleven studies.Secondly, we standardized the ranks so that they fell within a range of 0 to 7. (For the studies that compared N models, all rankings were multiplied by 7/N.)By doing so, we ensured that the final ranks accurately reflected the relative performance of each model.A lower final rank indicated a better average performance across the studies.
Finally, in terms of both accuracy and AUROC, the gradient boosting (GB) model consistently achieved the highest performance rank among the six models with a substantial margin compared to the next highest-ranked model.However, given that meta-analysis was not possible, it is not feasible to assess whether this margin was significant or not.This suggests that the GB model shows promising performance in predicting violence among SSD patients using clinical data.

Discriminative features
Various features were identified in the included studies as the predictor variables of VB in SSD patients.We can classify most of them into sociodemographic, clinical, metabolic, and neuroimaging groups.Most of the features were consistent in multiple studies, except for some discrepancies, which will be elaborated upon.

Sociodemographic features
Some studies identified age (34,37,43,45,47), gender (34), and educational level (38,43) as factors that contribute to the prediction of VB.However, other studies reported that these factors do not have a significant relationship with the occurrence of such behavior (35,37,44).

Clinical features
Psychotic symptoms are associated with VB in SSD patients.Different studies consistently demonstrate that negative symptoms, such as flat affect and poverty of thought, decrease the risk of VB (35,40).However, there is inconsistency in the results concerning the impact of positive symptoms on the occurrence of VB.Some studies suggest an increased risk of VB associated with positive symptoms (35,43), while others propose a diminishing impact of specific positive symptoms, including delusion of persecution and auditory hallucination, on VB occurrence (40).Furthermore, various studies reported that daily dosage of prescribed olanzapine-equivalent at the time of discharge from previous psychiatric hospitalization of SSD patients can predict the occurrence of VB among them (39, 45,46,48,49).However, their results were divergent, with four studies demonstrating a positive association between the olanzapineequivalent dosage and risk of VB (39,46,48,49), and one study reporting a negative association (45).
Patients' past stresses also can contribute to VB. Patients who have experienced a higher number of past stressors had an increased risk of engaging in VB (42,51).Consistently, history of previous outpatient psychiatric treatment was found to be associated with an increased risk of VB in patients (46,(48)(49)(50).In addition, specific stressors, including a history of coercive psychiatric treatment and separation from main caregivers in childhood or adolescence, have also been found to be related to VB (42).There is a lack of consensus on the relationship between patients' employment status and VB.While Kirchebner et al. (2022) found a significant correlation between unemployment and VB (42), Chen et al. (2015) and Wang et al. (2020) reported no statistical relevance between a patient's employment status and the likelihood of VB (35,37).
Additionally, scores of several rating tools are significantly associated with VB.The BPRS total score, BPRS hostility score, BPRS withdrawal factors score (38), ITAQ score, family APGAR score, SSRS score, and FBS score (47) were all found to correlate with the risk of VB.Moreover, the PANSS total score at admission and discharge (39, 45), and PANSS anxiety and lack of spontaneity scores (50) are significantly related to VB.Other statistically relevant clinical features are presented in Table 1.

Neuroimaging features
Two studies explored potential neuroimaging features for predicting VB.Gou et al. (2021) identified brain features associated with regional homogeneity (ReHo), gray matter volume (GMV), and fractional anisotropy (FA) as effective predictors of VB in schizophrenia patients (38).Significant GMV alterations were observed in the striatum system (including the putamen and pallidum), median cingulate, and paracingulate gyri, as well as temporal, occipital, and anterior parts of the parietal lobe.In addition, ReHo was most predictive in the anterior cingulate, dorsolateral part of the superior frontal gyrus, temporal pole, parietal lobe, and subcortical areas of the striatum, such as the caudate and pallidum.Also, the left superior longitudinal fasciculus was found to play a crucial role in FA predictions.Overall, the study identified the cingulate gyrus, dorsolateral part of superior frontal gyrus, temporal lobe (inferior temporal gyrus and temporal pole), supplementary motor area, and pallidum as the key regions for predicting VB in schizophrenia patients using sMRI and fMRI (38).On the other hand, Yu et al. (2022) found that the measurement of whole-brain GMV, right areas of superior temporal sulcus cortical thickness, right inferior parietal cortical thickness, and left frontal pole GMV correlated to the likelihood of violent tendencies (44).

Metabolic features
Three plasma metabolites were recognized as potentially effective biomarkers for predicting VB.In the study by Chen et al. (2022) the ratio of L-asparagine to L-aspartic acid, vanillylmandelic acid, and glutaric acid was found to be associated with an increased likelihood of VB (36).Specifically, a decrease in the ratio of L-asparagine to L-aspartic acid and glutaric acid level and an increase in the vanillylmandelic acid level appear to be correlated with violent tendencies.Furthermore, altered specific metabolic pathways seemed to predispose individuals toward violence.Specifically, the glycerolipid metabolism pathway, characterized by an up-regulation of glycerol and a down-regulation of glycerol-3phosphate, and the phenylalanine, tyrosine, and tryptophan biosynthesis pathway, marked by a down-regulation of 4-hydroxyphenylpyruvic, have been associated with violent tendencies (36).Moreover, it has been demonstrated that raised triglyceride levels were associated with a reduced likelihood of engaging in VB (35).

Risk of bias assessment
Based on the results of our ROB assessment using the PROBAST guidelines, all studies except for two (43,47), had some bias due to a small sample size, different violence definitions, and the inability to satisfy the study's purpose.Although most of the studies had high ROB, the most important limitation arises from their limited sample sizes.According to the PROBAST guidelines, to achieve a low ROB in the analysis domain, the number of participants with the outcome relative to the number of the input variables should be equal to or higher than 20 (33).Only four reviewed studies had low ROB in the analysis domain (41,42,44,47).Another reason for high ROB arises from the divergent definitions of violence and the use of different scales across the studies.We defined VB as an attempt or action to harm a target, assault, child or sexual abuse, and violent crimes.Whereas, Hofmann et al. (2022) included verbal aggression in the definition of VB (41) and four studies evaluated the ability of ML models to classify patients with previous criminal offenses (including VB) from non-offenders (46,(48)(49)(50).Also, some studies have evaluated the power of ML models in predicting VB (e.g., homicide) among offenders with SSD disorders (39,42,45).Although many studies represented high ROB in at least one field according to the PROBAST guideline (33), most of them (11/18) showed low concerns regarding applicability in the field of violence prediction in patients with SSD.Table 3 and Figure 2 illustrate the results of the quality assessment process.

Key findings
Previous research has shown an acceptable power for ML models in predicting VB in populations broader than SSD The ranks are within a range of 0 to 7. (For the studies that compared N models; all rankings were multiplied by 7/N.)A lower rank indicates a better prediction performance.
patients (68,69).In this article, we reviewed the role of ML in predicting VB in patients with SSD.According to our findings, the predictive performances of the ML models varied across the reviewed papers.However, ML models performed better in studies that employed more intricate methodologies for model development and evaluation.These findings suggest that a welldesigned ML model could be a potential tool for VB prediction in SSD patients, and could be beneficial in warning the caregivers to seek prevention techniques and stop them from further harmful acts in clinical and forensic settings.Among the reviewed ML models, GB showed the best performance in VB detection.Also, we reviewed the most discriminating features in violence prediction of SSD patients.Age (34,37,43,45,47) and olanzapine equivalent dose at the time of discharge (39, 45,46,48,49) were the most repetitive variables found to be associated with violence across the studies.

Machine learning models
While direct comparison of results among studies was challenging due to the differences in sample characteristics, some insights were obtained.First, about two third of the studies (11/18) could reach values above 75% for both AUROC and accuracy, indicating that ML can be a promising tool for the accurate prediction of VB among SSD patients.Second, the studies demonstrated diverse performance in predicting VB among SSD patients, with an AUROC ranging from 0.56 to 0.95 and an accuracy range of 50.27% to 90.67%.However, the performance ranges within each study were narrower when comparing different ML models.Considering that many studies employed similar ML models and input variables, the observed diversity in performance appears to be partly influenced by the variations in study designs.This suggests that future similar studies could enhance their results not only by focusing on ML model selection or input variable choices but also by paying attention to the details of model development to mitigate biases.
In addition, there exists considerable divergence among the reviewed studies with regard to the methodologies employed for both feature selection and cross-validation.These two components play pivotal roles in the trajectory of ML model development, serving to mitigate overfitting and augment overall model performance (70).Within the included studies, a mere seven undertook data-driven feature selection utilizing ML techniques prior to model training as a preemptive measure against overfitting (38,39,(41)(42)(43)45).Notably, one study adopted a post hoc approach, selecting features subsequent to the computation of variable importance for each employed model independently (51).Additionally, sixteen studies embraced diverse methods for crossvalidation, while two studies opted to forgo its application (35,40).This heterogeneity in model development practices across the reviewed studies poses a significant obstacle to synthesizing their respective findings.
Therefore, our significant challenge was comparing ML models by integrating the results of different studies due to variations in sample characteristics, including differences in input and output variables distribution.To address this challenge, we devised a ranking method that enabled us to assess the overall success rate of commonly used methods.Based on our findings, the GB model exhibited notably superior average performance.However, it is essential to note that this does not necessarily imply inherent weakness in the other models.Instead, it highlights the favorable results achieved by the GB model in the specific context of the studied field.
GB is a subset of ensemble machine learning models, which also includes common models like classification trees and RF (32).This approach enables the effective handling of big data and also the handling of missing values in the predictors (32).While common ensemble techniques like RF rely on straightforward averaging of models within the ensemble, GB stands out for its step-by-step, sequential strategy for selecting the best predictor (71).This notable flexibility empowers GB to be highly adaptable to specific datadriven tasks (71).Due to its unique characteristics, GB outperformed other ML models in predicting VB in SSD patients, particularly due to its effective handling of a large number of predictors.Nevertheless, it is noteworthy that several other ML models, including SVM, LASSO, NNET, RF, decision trees, PDA, MLP, elastic net, and LR, in the studies reviewed, also achieved AUROC values exceeding 0.9 (41,44,47,51).This highlights the substantial predictive potential of these alternative ML models in addition to GB.

Discriminative features
Notably, various studies have explored the influence of age on VB risk, yielding diverse findings.For instance, Tzeng et al. ( 2004) associated younger age with a higher risk of VB (34).In contrast, four other studies observed that older age correlated with an increased tendency for VB in SSD patients (37,44,45,47).Yet, Chen et al. (2015) found no significant correlation between patients' age and VB risk.While the majority of previous research aligns with Chen et al. (2015) and negates the association between age and VB risk (35,72), there are outliers such as Soyka et al. ( 2007) who identified older ages as linked to a higher VB risk in SSD patients (73).This variability underscores the need for further research to ascertain the precise impact of age on VB occurrence in SSD patients.
Contrary to age, the reported gender effect on VB occurrence risk was quite consistent among studies, which showed that male sex was associated with a higher risk of VB (34).These findings confirmed most of the previous studies that reported a higher prevalence of VB among male SSD patients (73, 74), as the general population (17) (38,43).This was in line with previous research that found lower educational levels to be significant predictors of VB among SSD patients (75,76) and the general population (17).These disparities suggest that further studies on larger populations are required to determine the exact effect of the educational level of SSD patients on their VB tendency.2016) studies, unemployment was able to differentiate SSD patients with serious VB (e.g., homicide) from patients with minor VB (e.g., property damage).On the other hand, two other studies trained ML models to differentiate SSD patients with any kind of VB (serious or minor) from patients without VB (35,37).This suggests that unemployment does not seem to be associated with the overall risk of VB among SSD patients, but it increases the risk of serious VB among offenders.
Regarding the clinical features, two studies reported that the presence of positive symptoms (35,43) was correlated with an increased risk of VB, which was consistent with previous research (72, 74, 77).However, another study suggested that the presence of specific positive symptoms, including delusion of persecution and auditory hallucination, decreases the risk of VB (40).This controversy indicates that different types of delusion may have varying effects on the occurrence of VB (40).Also, Sonnweber et al. (2021, 2022) found a favorable predictive power for the PANSS total score of the patients in two different studies (39,45).This is in line with previous studies that demonstrated higher PANSS total scores in violent patients, compared to non-violent patients (78, 79).Moreover, consistent with previous research (80), Gou et al. (2021) found that the risk of VB occurrence is higher among SSD patients with higher scores in the BPRS hostility subscale.Furthermore, higher scores in BPRS total score, BPRS withdrawal factors, PCL-SV, HCR-20 (38), and SDSS (43) successfully predicted VB in SSD patients across the reviewed studies.While the BPRS and PANSS scales assess various domains of SSD, including positive and negative symptoms (55, 81), the PCL-SV scale is specifically designed to evaluate psychopathic traits in patients, which is not directly associated with SSD (82).This indicates that aside from psychotic symptoms, additional symptoms like patients' personality profiles, including psychopathy and impulsivity, may have relevance in predicting VB among individuals with SSD.Altogether, these suggest that by training ML models with certified psychiatric rating tools, we can significantly improve the accuracy of predicting VB in SSD patients, which can be highly beneficial in clinical applications.Chen et al. (2015) found negative symptoms to be correlated with a decreased risk of VB (35), which is in line with previous studies that found depressive and other negative symptoms to be associated with a lower occurrence of VB in SSD patients (73, 77).Furthermore, the effect of the age of disease onset was controversial across the reviewed studies.While Sonnweber et al. (2021Sonnweber et al. ( , 2022) ) 1999) did not find any significant differences between the age of onset of violent and non-violent patients.Therefore, further research is warranted to determine the effects of disease onset on the VB occurrence in SSD patients, as it can help the early detection and treatment of patients at higher risk of VB.
While most studies evaluating the prescribed daily olanzapineequivalent dose at the time of discharge from previous hospitalizations have reported a positive association with the risk of VB (39, 46, 48, 49), there is an exception in one study that reported the opposite (45).The divergence in findings can be attributed to the different focus of the Sonnweber (2022) study, which specifically differentiated between homicide committers and patients who committed other types of VB (45).It is logical to assume that higher doses of antipsychotics are prescribed to patients with more enduring symptoms, as they are reported to be more prone to engaging in VB in some studies (83).However, some previous studies found no significant association between the disease severity or prescribed dosage of antipsychotics and the risk of VB in patients with SSD (84, 85).This highlights the need for further research to better comprehend the relationship between disease severity and prescribed antipsychotic dosages in the occurrence of VB among SSD patients.
Previous research has shown that SSD patients' previous history of violence is significantly correlated with increased risks of VB, such as recent violence episodes (86), history of a recent assault (87), previous history of aggression (74), and a previous violent conviction (87).Consistently, Sonnweber et al. (2021) reported previous conviction history as a significant predictor of VB in SSD patients (39).Moreover, Wang et al. (2020) found that a history of more than five times of hospitalization increased the likelihood of VB tendency in patients.However, Tzeng et al. (2004) reported that the lifetime number of hospitalizations was not correlated with an increased risk of VB occurrence in SSD patients.This disparity could be due to the differences in the psychiatric history assessment across the studies, as Tzeng et al. ( 2004) evaluated a broader variable (lifetime hospitalization), while other studies assessed the recent hospitalization history (88,89) or a more distinguishing variable (≥ 5-lifetime hospital admissions) (37).
Finally, two studies have observed that neuroimaging variables were robust predictors of VB in SSD patients.Yu et al. (2022) found decreased whole-brain gray matter volume, right inferior parietal thickness, and left frontal pole volume to be predictors of VB.Consistently, Gou et al. (2021) reported disruption in the structural and functional MRI of the temporal, frontal lobes, cingulate gyrus, and striatum can predict VB in SSD patients.Also, a systematic review of 21 studies, revealed that reduced volumes of the frontal lobe in patients with schizophrenia are associated with a higher rate of VB occurrence (90).This is not surprising, as previous research mentioned a prominent role for frontal and temporal lobes and cingulate gyrus disruptions in developing VB (91).Considering the role of the frontal cortex in controlling disinhibited behaviors (e.g., impulsiveness, aggressiveness, and violence), patients with disrupted frontal cortex are more likely to present VB (91, 92).Although previous research established the involvement of the hippocampus and amygdala in emotional processing and in the development of VB (91), the predictive value of these regions was not assessed across the reviewed studies.In conclusion, our knowledge in the field of ML-based prediction of VB in SSD patients by training MRI data is still limited, and future research is required to clarify its potential.

Limitations and further directions
This study has some limitations.First, the sample sizes of most studies were small, considering the number of input variables, which can influence their analysis results.Second, the study samples across the reviewed articles were heterogeneous, as most of them studied clinical inpatients, while some studied forensic inpatients, and one included only outpatients' data.Also, some studies only included male patients.Third, the outcome definitions differed within studies.For example, while most of the studies classified SSD patients into violent and non-violent, some others distinguished patients with serious types of VB (e.g., homicide) from other types of VB.Fourth, the reviewed studies were conducted in countries with different healthcare systems, which could have a significant impact on violence among SSD patients.Fifth, most of the studies did not select time-dependent features for VB prediction, which substantially lowers the ML model performance.Finally, none of the reviewed articles performed external validation, which can significantly diminish the generalizability of their findings.Therefore, future research with more homogenous methodologies and both internal and external validations seems to be necessary.

Conclusions
The outcomes of the ML models employed by the reviewed studies have yielded compelling findings, highlighting the significance of continuing along this research trajectory for further exploration and advancement.More in detail, while the ML models' performance in VB prediction among SSD patients was divergent, yet promising, our comparative analysis demonstrated that GB outperformed other ML models.Considering the heterogeneity of ML model applications and study populations across the reviewed articles, there is substantial potential for further research in this field.Furthermore, the absence of external validation in the majority of the included articles reduces the generalizability of their findings.Indeed, subsequent research endeavors, employing comparable models, outcomes, and predictors, in extensive clinical samples, are imperative to substantiate the certainty of the current findings and ascertain the applicability of the developed ML algorithms.
Moreover, given the rapidly growing trend in the application of various artificial intelligence tools in medical contexts, it appears likely that in the next years ML models can be also utilized for VB prediction in SSD patients.Indeed, while the performance of ML models varied across the reviewed studies; several models demonstrated excellent predictive abilities with an AUROC exceeding 0.9.This highlights the potential for developing reliable ML models through further well-designed studies.Upon validation through external assessments, these models could effectively predict VB in real-world clinical settings.Consequently, the development of clinical assessment tools integrating patient data could facilitate the early identification of individuals highly susceptible to VB, whether in outpatient or inpatient settings.The utilization of such tools enables timely preventive interventions, such as providing social support and rehabilitation, adjusting medications, and considering more intensive therapeutic approaches, like electroconvulsive therapy.Implementing these measures could significantly alleviate the burden of VB on patients, healthcare systems, and society at large.

Glossary
: Wang et al. (2020) considered the outcome as physical aggression, irrespective of the aim or the outcome of VB (37), Gou et al. (2021) considered it as physical aggression aimed at others and leading to injury (38), and finally Yu et al. (2022) and Cheng et al. (2023) defined VB as a minimum MOAS score of 5 or 4 respectively, which could be achieved by various VBs without restricting the type or the target of it (44, 47).Additionally, four studies employed different scales for the VB definition: Tzeng et al. (34) used the Violence and Suicide Assessment (VAS-A) (61), Chen et al. (35) utilized the Violence Scale (28), Chen et al. (36) employed the MacArthur Violence Risk Assessment Study (MVRAS) (62), and Watts et al. (51) used the Aggressive Incidents Scale (AIS) (63).Meanwhile, three other studies simply defined VB without the use of any scale: Sun et al. (2021) and You et al. (2022) focused on physical VB aimed at others (40, 43), while Hoffman et al. (2022) included physical VB regardless of the aim

FIGURE 1
FIGURE 1Study selection process flow diagram.
Furthermore, Gou et al. (2021) and Yu et al. (2022), but not Chen et al. (2015) and Yu et al. (2022a), found that lower educational levels could predict VB occurrence Moreover, in terms of the effect of occupational status on VB tendency, Kirchebner et al. (2022) reported a significant relationship between unemployment and VB in SSD patients (42), which confirmed the findings by Karabekiroglu et al. (2016).Conversely, Chen et al. (2015) and Wang et al. (2020) found no correlation between employment status and VB (35, 37).This divergence could be a result of different definitions of violence in these studies; indeed, Kirchebner et al. (2022) and Karabekiroglu et al. (

FIGURE 2
FIGURE 2Assessment of Risk of Bias based on PROBAST.

TABLE 1
Characteristics of the included studies.
*Boosted tree used for feature selection Violent offenses based on Swiss law: homicide and attempted homicide, assault, rape, robbery, arson, and child abuse.Nonviolent offenses: threat, theft, damage to property, minor sexual offenses (e.g., exhibitionism), drug offenses, illegal gun possession, and other minor offenses (e.g., triggering false alarms or emergency brakes).

TABLE 2
Performance ranks of each machine learning model across the different studies.Area Under the Receiver Operator Characteristic Curve; GB , Gradient Boosting; NB , Naive Bayes; RF , Random Forest; DT , Decision Tree; LR , Logistic Regression; KNN , K-Nearest Neighbor; SVM , Support Vector Machine.
reported that younger age of disease onset correlated with the probability of VB, Chen et al. (2015) and Wang et al. (2020) did not find a significant relationship between the age of disease onset and VB occurrence.The findings of previous research in this field are also divergent.Indeed, while Caqueo-Urıźar et al. (2016) found VB to be more prevalent among patients with younger age of illness onset, Nolan et al. (